Sergey Nivens - Fotolia
Published: 02 Jun 2016
"Those who cannot remember the past are condemned to repeat it." Though written more than a century ago by Spanish essayist and philosopher George Santayana, the aphorism is an apt fit for today's rapidly evolving technology of cloud-based streaming analytics.
Think of cloud-based streaming analytics as split into four varieties: knowing what occurred, understanding why it happened, looking ahead to what might take place and, ultimately, determining how to influence future occurrences. Those four analytics flavors -- descriptive, diagnostic, predictive and prescriptive, respectively -- are progressively more difficult to implement and use, but return an array of bountiful business benefits.
To start on streaming analytics, it's necessary to decide what data to use. As the volumes of data multiply, collecting, storing and filtering them becomes more difficult. Last October, IDC reiterated previously published research in which it predicted the amount of data created annually would grow from 4.4 zettabytes worldwide in 2013 to a whopping 44 zettabytes (44 trillion gigabytes) in 2020 -- an astonishing 40% annual growth rate.
Just a few years ago, numbers of this magnitude would have been unfathomable. Few knew the word petabyte, let alone zettabyte. Note that we skipped right over exabyte on the way from petabyte to zettabyte. Yottabyte -- that's a trillion terabytes -- can't be far behind.
Consider one streaming analytics example: At Weather Underground, one of the Weather Channel digital assets recently acquired by IBM, weather readings in the United States are collected every 15 minutes from more than 180,000 stations. That adds up to 100 GB of data generated every day, streamed and analyzed in real time.
It's likely your organization is nothing like Weather Underground. For most, the percentage of collected data that's actually used for streaming or periodic cloud-based analytics is surprisingly low. John Bates, group product manager for Adobe Analytics, estimates that current data access rates by Adobe customers average less than 2%. Mike O'Rourke, IBM's vice president for business analytics, believes it's not even that high for his company's customers. "In terms of the data [IBM is] pulling for customers and the things they're looking at … it's definitely less than 1%."
Mike GualtieriForrester Research analyst
Is that because these organizations are casting too wide a net? Or are they keeping data for too long rather than aggregating and purging it? Is it because continually buying more storage space is the path of least resistance? It depends. If you're tracking data from studies on cardiovascular disease that spans nearly 70 years, keeping every last byte is crucial. But day-to-day sales data for shoes your company sold during the disco craze of the 1970s, well, that's not so crucial. What data you ignore may be among the most critical decisions you'll make.
Another challenge is ensuring that data presented for analysis is never delayed. Though Apache Spark is gaining popularity as an analytics processor, its underlying engine microbatches incoming data, resulting in high latency. If you're analyzing the performance of jet engines in flight or trying to beat other equities traders to a deal, that likely isn't good enough.
"If you don't or can't act on the data instantaneously, the moment is gone, the window has closed and the value is diminished," said Mike Gualtieri, a Forrester Research analyst. Apache Flink, so new that most are not familiar with it, is a true streaming engine with very low latency.
Analytics extends well beyond the realm of business transactions or internet of things sensor events. Temple University's 2015 Analytics Challenge asked entrants to address one of three wide-ranging issues, from examining if an Ebola vaccine can change world health, to the question of whether the union of television and digital technology can increase sales, to a study of who the best audiences are for cultural institutions as their typical clients age and ticket sales decline. Major corporations, including Campbell's, Lockheed Martin, Merck, QVC and Walmart, have signed on to be members of Temple's Institute for Business and Information Technology. One of the perks of membership includes influence over the challenge. For 2015, that benefit was felt by Merck, which was involved in the Ebola question, and QVC, which supplied data for the television and digital technology challenge.
It's clear that the field of streaming analytics is growing, technologies are evolving, applications are limited only by our imaginations, demand for analytics talent has never been higher and universities are stepping up to address the shortage. What could be more exciting than that?
How much data is too much for cloud analytics?
The veracity of big data is critical -- especially when it's not yours
Will a zettabyte of data push you to cloud storage?