Ritika Gunnar is vice president of offering management, data and analytics at IBM. She has also served as a software engineer and as vice president for information integration and governance in IBM's platform analytics group. In this exclusive interview with SearchCloudApplications, she discusses the evolution of the data science industry and the skills that developers must possess to flourish in a data-driven world.
As the volume and velocity of data grows, what does it mean to work in the data science industry today?
Ritika Gunnar: Traditionally, when people take a look at data science, they look at the technical things and think of someone who has the technical aptitude to rapidly develop new kinds of algorithms. But, today's landscape is evolving very quickly. Look at the programming languages that have come along and evolved in the past decade -- R, Python and Scala to name three -- means you have to know which one to choose for different types of situations. This is further complicated as other technologies evolve, like natural language processing, cognitive computing, and AI (artificial intelligence). These are all very closely aligned with what data scientists are doing. You have reasoning, automation, and self-learning characteristics along with predictive, prescriptive, and machine learning. These are the technical skills required to be a good data scientists and they are evolving at a rapid rate.
Technical expertise is one side of the coin. What about the other side?
Gunnar: That's where you get to other points -- social and business capabilities -- that people don't usually think of, but which are essential. What we've found is projects that start in an organization to solve a problem require that the data scientist collaborate and communicate across the entire organization. This is necessary to understand the scope and nuances of the business problem and work with IT to get to the data. The social aspect of data science is often forgotten. We need to bring that back in.
How should a data scientist collaborate with the business side?
Gunnar: A lot of projects start with a request from a line-of-business department that is asking a question. That means the real collaboration driver starts on the data scientist side with understanding that goes beyond the immediate request. If you're going make and derive insights that change the way a company does business, you must have a fundamental understating of the business and its data.
It's necessary to integrate the technology and business skill sets?
Gunnar: You need technical acumen, be able to communicate and collaborate, and be able to correlate data with how the business actually functions. If you don't have social skills and business insight in addition to technical aptitude, the data science process is lost.
This aspect of the data science industry is reminiscent of having that rare ability to straddle the fence between business and technology. Is this the evolution of yesteryear's systems analyst?
Gunnar: The traditional roles of systems analyst or business analyst are still there, but what has changed is the intensity and enormity of how they are used across the business. Data science used to answer questions of a limited scope. If you look at where the changes are in the industry today, data science is used far more profusely across the organization. This means applying analytics and insights across every stage of the business. It's a lot more pervasive and that makes the demands on data scientists a lot more pervasive, too.
Speaking of the past, is yesterday's DBA (database administrator) today's data scientist?
Gunnar: Yesterday's database administrator is today's data engineer. If you look at what data engineers do, they do some programming, math, SQL, administration and storage. Data scientists are different; they do model building, develop algorithms and do storytelling based on what the data says. They do statistics, math, programming and visualizations of data. Yes, there is some overlap in the programming and math area; a data engineer might do those in SQL and a data scientist through Scala.
Gunnar: Traditional application developers need to understand what it means to create modern web, cloud or mobile applications. When you build data-driven apps, you need to ask what kind of insights you want to deliver and determine what kind of data and analytics must be infused into an application to do that. You need to transform to where you are infusing analytics into every part of an application. That makes algorithms and deployment part of the activity that a modern developer must do. Knowing how to build more reactive, analytically-driven applications is what developers must have to survive.
Where does an analytically-driven application get used?
Gunnar: In one example where we are using a lot of analytics is in a major call center that receives hundreds of thousands of customer calls per day. From the call flow, we can determine which products and which aspects are of concern. With data and analytics, it's possible to automatically know how to handle clients or what they are calling in for before they even reach a representative. That's the power of using analytics to determine behavior and prescribe what steps to take next.
What should be in a data scientist's toolkit?
Gunnar: It's Java, R and Python, and maybe using those on large-scale platforms like Hadoop, HDFS, Spark and Storm. These are now being used on much larger data sets and with much more complicated algorithms, like SPSS and Matlab. But, that is going much deeper than basic analytics and puts you on the far end of the data science spectrum.
Where is the data science industry headed next?
Gunnar: What's coming down from larger-scale data that we want to induce things from is using machine learning with recommendation engines to say 'You may not have known it, but based on the patterns we see, here's what you need to look into.' This is absolutely where things are going. (Editor's note: Recommendation engine is a technology that applies filters to data based on predictions of how users are likely to rate products, media or other content, reducing many possible choices to a likely few.)
Public cloud is machine learning's next big thing
Artificial intelligence: Too artificial for you?
Do you need a big data algorithm?