grandeduc - Fotolia
Business analytics is becoming a do-it-yourself project with front-line employees now able to generate their own instant analytics reports via search-bar queries. Doing so depends on building connections to a vast, ever-changing swath of diverse data sources. ThoughtSpot took on this challenge by leveraging data-connection technology from Informatica. In this conversation, two ThoughtSpot executives, chief marketing officer Scott Holden and Vijay Ganesan, junior co-founder and principal engineer, explain the process. For any business that needs to generate real-time, query-based analytics and reports, the barriers to entry are lower than ever.
What is the idea behind instant analytics generated via search-bar queries?
Scott Holden: The big value proposition is a simple search experience where people can ask questions of their data and get back charts and answers as they type. Think of it as a new breed of search engine for numbers. Type "revenue by product last month" into a search box and the charts come back instantly.
Analytics is about the data. What is the balance point between data sources and performance?
Holden: We can scale across hundreds of thousands of data sources and still deliver sub-second latency performance even as you're searching to analyze billions of rows or terabytes of data. It has to perform at scale. The faster we could get access to lots of data, the better the value proposition and more useful we will be for large customers. Data can reside anywhere in the cloud or on premises. A big bank is likely to have a lot of proprietary, on-premises databases.
Who are the users benefiting from query-based instant analytics?
Holden: We're going after front-line rank-and-file people with no training but who know how to use a search box. Just as you ask Google for driving directions, you can ask questions of your data and get an answer back.
Who makes the decisions about the data?
Holden: Selecting data sources is the job of data analysts. It is IT's job to make sure those data sources are integrated through the Informatica connector. It's our job to make it easy for IT and data analyst to turn on the switch for access. Do that and end users are off and running.
How does ThoughtSpot deal with the potentially huge volumes of data?
Vijay Ganesan: ThoughtSpot uses mostly structured, curated data. We're not analyzing sensor or IoT data or raw data in Hadoop. In terms of volume we're talking terabytes, not petabytes of data. Getting started requires an initial bulk load that can be very large. But that is followed by daily or hourly incremental loads that are much smaller.
Do you use Informatica's technology to build connectors separately for each of ThoughtSpot's clients?
Ganesan: We are doing two things. First, we are building our own ThoughtSpot data-source connector. This is like any other connector available through Informatica's marketplace. We're one of about 150 connectors there. If you're an Informatica shop and already have an Informatica cloud license, you just download our connector and use that to push data into the ThoughtSpot system. Second, we will be bringing out an integrated solution. Instead of needing an Informatica license, you buy ThoughtSpot and within the data management area, initiate a connection and move data into ThoughtSpot. It's seamless; customers won't know Informatica is under the covers. We take care of the licensing and provisioning.
How would a business do this if it had your query-based instant analytics but opted out of this third-party connector technology?
Scott Holdenchief marketing officer, ThoughtSpot
Ganesan: It would be necessary for the business or us to write code for every data source, such as SQL Server, Salesforce, Marketo, et cetera. Each source has its own way of exposing data. For traditional databases, there is ODBC and JDBC access. Salesforce and Workday have Web service APIs. The specifications for connecting, security and credentials management are all different.
This is where Informatica comes into the picture?
Ganesan: Informatica provides a standardized view that allows us to see any data source in terms of tables, columns and relationships. Informatica exposes a standard API independent of the data source. We can get metadata about that source in a standardized form and use that. How the translation happens between each data source and the standardized form is Informatica's business.
You make the connection and conduit aspect sound easy. But what challenges did you find in implementing thousands of connections?
Ganesan: The two main challenges are security and testing. We are an appliance that sits in the customer's datacenter. From that box, you need to establish a connection to the Informatica cloud. That can be a problem for customers still paranoid about any connection from their datacenter to the cloud. The solution is security audits. The other challenge is testing. Even though Informatica certifies its connectors, when we release to our customers, we have to certify against each data source and verify that everything works. That is a challenge because there are so many different sources.
This seems almost too easy. Is the road to query-based instant analytics actually more difficult than simply installing the ThoughtSpot appliance and the Informatica data-source connectors?
Holden: We were pleasantly surprised to find how easy it was to get involved. In a couple of weeks, we were able to work with Informatica's development team to build our connector and tap into the goodness they already had in place.
Is implementing query-based instant analytics simple enough that it will eliminate the need for developers?
Holden: Application developers are not going away. To give front-line employees access to information faster we work with the business intelligence and IT teams. The data analysts set up the pipeline coming from data sources into a data playground for end users. You still need to turn on the right sources, select the right columns and limit the scope of play. Those are the things that developers are good at.
To summarize, what does query-based instant analytics offer?
Holden: This is about getting access to many thousands of different data sources, turning on the ones you find useful, and searching through a search box without doing any modeling upfront. It's the Holy Grail for anyone in the data business, to get access quickly and put it in the right people's hands.
Is search engine optimization obsolete?
Is data curation the next big thing in data integration?
Here's what you need to know about streaming analytics