JohanSwanepoel - Fotolia

Cloud-based analytics best practices demand clean data

A successful cloud-based analytics implementation depends on following best practices to ensure that data from all sources is clean and current. Here are some tips.

You've decided to move ahead with a cloud-based analytics application. What is the worst mistake you can make? And what steps should you take to ensure a successful implementation? Fortunately, others who have been there before you are quick to share their observations. Not surprisingly, it mostly boils down to the data.

"The single biggest mistake that enterprises make when getting started with cloud analytics is existing in a state of paralysis, under the mistaken impression that they must collect all of the available data before attempting to make business decisions," said John Bates, group product manager at Adobe Analytics. "It is smarter to start with what you have and then layer in additional data sources over time."

Just as national political presidential preference polls reach statistically valid conclusions from a sample size of only a few hundred opinions, amassing every last bit of data is unnecessary -- and likely impossible -- for undertaking cloud-based analytics, Bates said. "There is always more and more data coming in." Best practices dictates going with the available data and adding to it over time. "The analytic models you build and the business insights you gain will grow more accurate over time as data is continually added," Bates said.

Though gradually increasing the data pool can help fine-tune analytic insights, do so at one's own peril, Simon James, global lead for performance analytics at SapientNitro, the digital subsidiary of marketing consulting firm Sapient, said.

Data is perishable, like fruit. The more you have, the less valuable it becomes.
Simon Jamesglobal lead for performance analytics, SapientNitro

"Data is perishable, like fruit," James said. "The more you have, the less valuable it becomes." Growing data stores continuously is an expensive drain on resources that can lead decision makers down the dangerous path of basing new business strategies on old data, James said. "The half-life of data is short."

Quantity and age notwithstanding, data accuracy, or rather the lack of it, is another common challenge for new cloud-based analytics implementations. "Any gap in data cleanliness is a major cause for concern. Investments must be made in how data is collected and transformed," Bates said.

In other words, the garbage-in-garbage-out adage of yore is alive and well. With mountains of data pouring in every second from innumerable sources -- volume, velocity, and variety -- it is that fourth "v," data veracity, that can be a cloud-based analytics implementation's downfall. Bates' recommendation is to review and audit data to make sure it is clean then move onto the next data set. For a large enterprise, data auditing might fall to a dedicated team. "The last thing you want to do is make erroneous decisions based on bad data," he said.

For Time Warner Cable, a major user of cloud-based analytics, best practices extend beyond data to balancing proper data management and business agility. "With cloud analytics, you have to get it right the first time, because the business wants to know how effective a change was right on the heels of deployment," Jeff Henshaw, TWC's senior director of business intelligence, said.

A key use of TWC's cloud analytics is to understand the effect of product-mix and online experience changes within its vast customer base. To do that, analytics must be deployed without any lag time when website or mobile app changes are made. "If we have to iterate on our analytics, it's already too late," Henshaw said.

Next Steps

2016 brings new data analytics methods
Data analytics models have a long way to go
Analytics needs a good data governance plan
HR data analytics demands sensitivity to business needs

Dig Deeper on Public cloud and other cloud deployment models