This content is part of the Essential Guide: How to solve your TMI problem: Data science analytics to the rescue

Skills and temperament drive success in cloud-based data science

Cloud-based data science is about more than data. Success demands curiosity, fearlessness, a desire to dig deep for answers, and a penchant for telling stories.

What does it take to be successful in the field of cloud-based data science? According to those working in this fast-changing world, data scientists need a balance of technical skills, expertise in statistics, curiosity, and a no-quit temperament. The problem is there aren't enough such people to go around.

As the speed of business continues to accelerate, the need to gain actionable insight from mountains of data quickly -- for strategic planning and competitive advantage -- has taken on critical urgency. The result is the rise of data science. Data scientists, who transform raw data into visualizations and stories that business leaders can comprehend and leverage, consequently, are in high demand and are difficult to find.

Or are they? To deal with the dearth of data scientists, dozens of domestic and international universities now offer curricula in data science, distinct from analytics, ranging from certificate programs for current IT professionals to full master's and doctorate programs.

Using data science tools without a deep understanding of statistics could lead to meaningless conclusions.
Elke Rundensteiner, Ph.D.professor of computer science and director of the data science program, Worcester Polytechnic Institute

It must be helping -- somewhat. In an August 2016 opinion piece for The Wall Street Journal's CIO Journal, Tom Davenport, a senior advisor for Deloitte Analytics and a distinguished professor at Babson College, writes that data scientists, which he once characterized "as scarce as vegetarian dogs," are now more easily found beyond the confines of Silicon Valley and Boston. Yet, Davenport is hedging: "It still may be difficult to find highly productive and effective data scientists," but, he said, the pool of candidates is now far from empty.

One complicating factor is that success in data science benefits from certain personality traits, something that is innate and which cannot be taught. In other words, data science isn't for everyone.

Data science, analytics are not the same

Shortage or not, businesses are scrambling to fill openings in cloud-based data science. As such, it's important to understand what a data scientist is -- and is not -- especially in contrast with analytics. Writing in his Data Science Insights blog, Jerry Smith, currently vice president of data sciences at Cognizant, differentiates the two.

"Analytics seeks to provide operational observations into issues that we either know we know or know we don't know," Smith writes. Conversely, data science provides "strategic actionable insights into the world where we don't know what we don't know." Tongue-twister perhaps, but the distinction is clear.

Every data scientist must possess a solid foundation in math and statistics, according to Elke Rundensteiner, a professor of computer science and director of the data science program at Worcester Polytechnic Institute (WPI) in Worcester, Mass. "Using data science tools without a deep understanding of statistics can lead to meaningless conclusions," she said.

Fearlessness is also a key quality essential for success as a data scientist, according to Rundensteiner. "Beyond fearlessness, you have to be interested in understanding data and continually asking questions without ever giving up," she said. "Initially, you might look at data and not see anything, but you have to be curious enough to keep sifting through and patient enough to keep working the data."

The notion that data is a competitive advantage is becoming fundamental.
Ritika Gunnarvice president of offering management, IBM

Persistence, it seems, applies to both data and those destined to interpret it.

That's the case with Tabassum Kakar. A native of Afghanistan, she studied computer software engineering in Pakistan as a wartime refugee, returning to Afghanistan to find work in the business intelligence field at a telecommunications company.

"They needed someone to look at huge amounts of data and find patterns," Kakar said. That worked ignited her interest in data science. Awarded a Fulbright scholarship to study in the United States, Kakar today is a Ph.D. candidate in data science at WPI.

Cory Hayward's introduction to cloud-based data science was through coding relational databases and a project to track and plot lobster migration patterns in New Hampshire's Portsmouth Bay. "I had no idea what data science was, but curiosity drove me to find patterns in the data and use that to tell stories," he said. It's essential to continually ask questions about what's hidden in that data and determine what those patterns mean, he added. "If you're not always questioning your data and communicating what you discover in a compelling way, then you cannot be a data scientist," he said.

Hayward, who received a master of science degree in data science in May 2016, now works as a data scientist at iCentrix, a Salem, N.H., provider of electronic medical records and data warehouse business intelligence technologies.

Infographic: What is a data scientist?

Data science is more than data

If there's one thing about which everyone agrees, it's that succeeding in cloud-based data science requires a comprehensive understanding of the company's business and how the decisions derived from data science -- if applied -- impact operations and, ultimately, profitability. "Data science in a cocoon doesn't help anyone," said Ritika Gunnar, a vice president in IBM's data and analytics business unit. "Projects that start to solve a specific problem require collaboration between the business side and the entirety of IT to get to the data and share its meaning across the organization."

Though cloud-based data science starts with the data, it is vastly different than simply working with databases and queries. "Looking for trends is just database stuff and SQL queries," Rundensteiner said. Data science is more about understanding the outliers and finding patterns that were not evident. You've got to dig much deeper."

That digging for insights is what separates data scientist, developer, and database administrator (DBA), said Hayward, who previously worked as an applications developer. "The DBA manages the data, the developer builds tables and user interfaces, but it is the data scientist who asks the questions about what's hidden within data and turns it all into something useful to the organization."

Gunnar described the soul of a data scientist as someone who understands the nuances of the business and who has an intimate understanding of the data in order to create algorithms that span multiple data sets of varying types and sources. The data scientist develops the algorithms, does storytelling, and visualizes data in a way that is comprehensible to those who ultimately use it as a basis for business decisions.

In the end, every company, regardless of what it manufactures, is a software company driven by data. How that data is extracted, interpreted and applied can spell the difference between mediocrity and excellence, Gunnar said. "The notion that data is a competitive advantage is becoming fundamental."

Joel Shore is a news writer for TechTarget's Business Applications and Architecture Media Group. Write to him at [email protected] or follow @JshoreTT on Twitter.

A litany of tools

Though there is some overlap with application developers, data scientists have their own tools for digging into the numbers to reveal patterns and then presenting them to business executives in a compelling, yet comprehensible manner. This list is far from exhaustive, but can be a good starting point.

  • Tableau, an interactive platform for data visualization through the fusion of databases and computer graphics.
  • R, an open-source language for statistical computing and graphics. Available as a free download, it is also offered in commercially packaged versions, such as Microsoft R, the outgrowth of Microsoft's 2015 acquisition of Revolution Analytics.
  • Scala, (a contraction of "scalable language") that runs on a Java Virtual Machine and freely uses Java libraries, classes and frameworks.
  • MATLAB, a mathematical modeling platform for engineers, created by MathWorks. If you drive a car or use a mobile phone, you're likely using it.
  • Julia, a language for technical computing that is a relative newcomer. The creators behind this free, open-source language claim it was designed with cloud computing in mind.
  • Power BI, a suite of products from Microsoft for data analysis and visualization.
  • SPSS, a product from IBM for statistical analysis.
  • SAS, a vast array of products for analysis statistics, and visualization from SAS Institute.

Next Steps

Think like a data scientist without becoming one.

Data science as a service? Really?

Hiring data scientists? Ask these interview questions.

Dig Deeper on Big data, machine learning and AI