carloscastilla - Fotolia

Is big data too big for cloud-based analytics?

Learn how big data keeps getting bigger, yet the percentage of data accessed by cloud-based analytics remains very low.

As cloud-based analytics are exposed to ever-increasing volumes of data, the pressure is on to deal the onslaught. Consider it a tug-of-war with the need for near-limitless scalability pitted against how much of that collected data actually ever gets scrutinized.

When it comes to capturing data for cloud-based analytics, "big" doesn't even come close to being an adequate characterization. In October 2015, IDC reiterated previous research noting that the amount of data created annually is expected to grow from 4.4 zettabytes in 2013 to a whopping 44 zettabytes (44 trillion gigabytes) in 2018 worldwide -- a growth rate of an astonishing 40% per year.

Without a doubt, businesses are dealing with huge amounts of data. At Weather Underground, the Weather Channel affiliate whose digital assets are in the process of being acquired by IBM, weather readings in the United States are collected from more than 180,000 weather stations every 15 minutes, generating in excess of 100 gigabytes of data per day.

Media giant Time Warner Cable (TWC) tracks every navigational move and button click that its more than 15 million customers make when using the company's mobile apps and website. A customer of the cloud-based Adobe Analytics service, TWC rid itself of dealing with torrents of incoming data and storage overhead by completely offloading to Adobe's analytics as a service (AaaS).

Despite TWC's large size, as long as the Adobe service has bigger customers, no cause for concern exists, according to Jeff Henshaw, TWC's senior director of business intelligence. "For TWC, scalability is not a factor," he said. "Our data sizing doesn't approach what some other organizations have in place. Until they hit the ceiling, we have no worries about data limitations, performance or network latency."

Jeff Morris, vice president of strategy at analytics services provider GoodData, agreed that scalability should no longer be a concern. "As long as you point your data pipeline to me, we can scale into hundreds of terabytes," he said.

The irony of big data

There are some kinds of data that get more valuable over time.
Mike O'Rourkevice president of business analytics, IBM

It would seem logical that in the world of cloud-based analytics, more data is better. That's not always the case. Given the rocket-like velocities and colossal volumes of incoming data of various formats, capturing, normalizing, and storing every last bit can be an expensive -- and often unnecessary -- endeavor.

"There are some kinds of data that get more valuable over time," said Mike O'Rourke, IBM's vice president of business analytics. For vineyards, historical data on grape growth, weather, climate, harvest and other factors may be valuable when analyzed over multiple decades. But, O'Rourke noted, after a period of years, storing daily high and low temperature readings may be sufficient compared with the expense of keeping all readings taken 15 minutes apart.

Similarly, in the retail industry, eventual aggregation of detailed minute-by-minute sales data into daily, weekly and monthly roll-ups may be sufficient for analytic purposes as time passes. "It totally makes sense to aggregate data [as it ages]," O'Rourke said. "That's got to be a part of the overall plan." Though the cost per terabyte of storage continues to decline, unless data is aggregated as it ages, actual expenditures will continue to rise due to continuous amassing of new data, he said.

While cloud data-storage providers work to expand storage capacities, the great irony is the miniscule percentage of collected data that is ever accessed for analytics purposes.

John Bates, group product manager at Adobe Analytics, estimated current data-access rates at less than 2%. "Users tend to focus in on the data that's most-directly related to how they account for success or key performance indicators," he said.

O'Rourke's estimate is even lower. "In terms of the data [IBM is] pulling for customers and the things they're looking at ... it's definitely less than 1%."

The good news is that those low data-access rates are likely to climb as analytics algorithms evolve to the where they can discover trends in data for which users were never looking in the first place. "As we continue to see advancements in analytics, machine learning and artificial intelligence, I see our ability to scale the amount of information being analyzed and leveraged growing much larger," Bates said.

Next Steps

2016 sees new methods for data analytics

Data analytics models don't always measure up

HR data analytics has special considerations

Big data growth drives IoT data analytics

Dig Deeper on Cloud APIs and integration