Colleges and universities worldwide are racing to implement new curriculums to fill the shortage of data scientists, granting master's and doctorate degrees in data science. One of these schools is Worcester Polytechnic Institute, or WPI, an internationally renowned school of advanced engineering and technology founded in 1865 and located in Worcester, Mass.
In this interview, Elke Rundensteiner, director of WPI's data science program and a professor of computer science, discusses what qualities go into making a good data scientist.
What drove your interest in data science?
Elke Rundensteiner: I have worked in data science from the very beginning of my career, taking data, analyzing it and trying to do good with it. The time has come where this is needed in all industries.
Why has data science become so important so quickly?
Rundensteiner: Many things are coming together simultaneously. First, data is now everywhere -- on cell phones, machines, email messages, social media and more. Consequently, we now deal with data in much larger quantities. Second, we're now able to do something with all that data. We now have cloud computing infrastructures where we can analyze data. Third, we've seen a rise in open source packages to do data analysis that is making it easier and more affordable for organizations to analyze their data.
What qualities make someone a good data scientist?
Rundensteiner: You have to be fearless, [and] interested in understanding data and asking questions -- not giving up. You could be looking at data and not seeing anything initially, so you have to be curious enough to keep on sifting through that data for answers. Data science has a large number of different analysis methods, and you have to determine which one is best for the data you have. You need patience and intuition and a solid background in data science education.
For application developers who lack a data science background, what skills do they need to acquire?
Rundensteiner: Being technologically savvy is essential. A good data scientist is a blend that goes beyond programmer or software developer. You must have a strong mathematical foundation in statistics. Otherwise, even with the best tools and data science education, you could end up drawing conclusions that are absolutely meaningless. You need a foundation in statistics to understand which hypothesis about the data to support, because there can be several. Coding skills will always be important, but being strong at development is less important than a foundation in statistics.
Is yesterday's database administrator (DBA) becoming today's data scientist?
Rundensteiner: These are very different skill sets. We will continue to need data administrators to curate and prepare data. That is quite different than a data scientist who uses analytics tools to develop algorithms and applications for answering specific questions that a client might have. DBAs are there to prepare the data and make sure it is easily accessible by the data scientist [who] is using tools to find answers to questions.
What tools do students use in WPI's data science classes?
Rundensteiner: In WPI's data science education curriculum, we use a variety of tools. We want to make sure students are prepared for any situation and not tied to any one product. We try to use open source over commercial products whenever we can, but we're careful to make sure that every course we teach uses a different technology. When they graduate, they are prepared for anything they might see.
We use the R language for statistical analysis, Hadoop for large-scale application development, and Spark and Scala. For programming, we mostly use Python and the Sidekiq machine learning libraries that come with it. In our visualization courses, we use Tableau and other visual interfaces. There are also more and more tools for analysis, such as Google TensorFlow [a library for numerical computation based on data flow graphs]. Our goal is to use many products; in the end, you won't be able to work with just a single tool in your career.
Also, we have a number of projects with companies where they bring teams of students in. That might say, 'We have a data set and want to see what you find in there to help us increase our profit margin.'
Elke Rundensteinerdirector of data science program, Worcester Polytechnic Institute
If you put a pure data scientist and pure application developer in a room together, what do they need to know about each other to create a successful working partnership?
Rundensteiner: This is hard question. Both need to be trained to work on an interdisciplinary team. They need to understand what it is the company cares about. The data scientist helps the business come up with the questions, while the software engineer is more into designing specifications, and [they] build and implement an architecture.
Many colleges and universities are now offering advanced degrees in data science education. How are WPI students faring in the job market?
Rundensteiner: Our data science students are being pursued like hot potatoes. Applications to our graduate data science education program are surprisingly high. We offer a master's degree and a Ph.D., but also offer undergraduate courses. The master's degree is the sweet spot for people who are already trained in computer science, but want to add data science skills.
Ready for data science as a service?
How to get the most out of your data scientists
Data science teams must mix tech skills with business savvy
What is DSaaS?