Sergey Nivens - Fotolia

Streaming analytics accepts no delays in examining big data

Complex event processing is giving way to streaming analytics, a technology that finds the millisecond delays caused by microbatching of data in Apache Spark unacceptable.

Complex event processing has given way to streaming analytics, a cloud-based technology that ingests and analyzes large amounts of incoming data in real time. In the streaming analytics world, delays of even a few milliseconds are unacceptable.

Streaming analytics is the collection, aggregation and filtering of so-called fast data, followed by its analysis in real time to provide instantaneous insights, alerts or recommendations. These torrents, or streams of fast data, can be generated in different ways: financial transactions from Wall Street stock trades or funds transfers; clickstreams from consumers purchasing goods and services online from desktops or mobile devices; or from devices monitoring equipment or industrial processes. More recently, data on an industrial scale generated by Internet of Things (IoT) sensors has become a significant contributor.

Makers of analytics products and industry analysts are both actively downplaying the term complex event processing. "Any phrase or acronym that starts with the word 'complex' is not going to get the attention it deserves," said Nelson Petracek, CTO of the strategic solutions group at TIBCO Software Inc., based in Palo Alto, Calif. Nagui Halim, an IBM Fellow and director of the IBM Streams business unit, agreed. "We don't have that debate anymore," he said. Perhaps piling on a bit, Giles Nelson, a senior vice president at Germany-based Software AG, said, "With streaming analytics, people understand what you mean. CEP makes people yawn."

Striim, another developer of analytics software, takes that view a step further. The company believes in streaming analytics enough to have changed its name in September 2015 from WebAction to a word that's pronounced "stream." Among its 20 analytics predictions for 2016, Striim said it expects complex event processing to evolve, morphing from a standalone technology into one component of the larger streaming analytics pie.

That pie is growing at a rapid pace. In its October 2015 report examining current trends in data platforms and analytics, 451 Research predicted the market for event- and stream-processing technology will soar from $383 million in 2014 to $1.37 billion in 2019, an astonishing compound annual growth rate (CAGR) of 29%.

Right here, right now

It is immediacy that sets streaming analytics apart from traditional descriptive and diagnostic analytics, which look at data after it has been collected and aggregated.

Instead of amassing information in batches and then performing after-the-fact analysis, streaming analytics occurs at the velocity of business in real time, without incurring even the tiniest delay or latency. Time offsets of even a few milliseconds are considered unacceptable and, depending on the usage scenario, could lead to potential danger or incur damaging financial consequences.

Typical real-time scenarios include monitoring airline jet engine performance and tracking stock market securities-trading transactions. "These are perishable insights," said Mike Gualtieri, a Forrester principal analyst serving application development and delivery professionals. "If you don't or can't act on the data instantaneously, the moment is gone, the window has closed and the value is diminished." Milliseconds lost may be the culprit when one stock trader is continually beaten by others.

Flink gaining favor over Spark

Given the requirement for zero latency, the widely used Apache Spark cluster-computing framework's stream-processing engine simply isn't up to the challenge. In fact, stream may be somewhat of a misnomer.

Gaining favor is a new arrival. Apache Flink is a high-throughput, low-latency stream-processing framework. It supports the concept of event times, essential for processing multiple data streams simultaneously, where events are likely to arrive out of perfect chronological order. Flink is still new, having become an Apache top-level project in January 2015.

Flink uses a single runtime to handle both streaming and batch processing. Put another way: If Spark is a batch-processing framework that comes close -- but not close enough -- to processing streams, Flink is a true stream-processing framework that is fully capable of handling batches.

"[Apache] Spark is a microbatching model; it is not true streaming like Apache Flink," Halim said. "Spark is not for high-speed financial trading."

Gualtieri was equally candid in his assessment. "We don't consider Spark a real streaming platform, because its architectural model is to microbatch in memory," he said. "If I batch data every 200 milliseconds, that might be acceptable in noncritical situations, but it cannot accommodate sub-second latencies on a consistent basis for critical applications."

Though Spark's in-memory batching can cache significant amounts of data, shortening batching intervals is still not good enough for the field of streaming analytics, Petracek said. "What is important to determine is where the boundary lies, where microbatching becomes no longer adequate and true real-time streaming becomes required."

Another alternative to Spark is Google Cloud Dataflow, a platform for processing big data streams and batches. Just last week, the Spotify music service announced that, as part of its transition to an all-new technology stack, it is implementing Dataflow for handling its extract, transform and load workloads. According to a published statement, the move provides the company with a single service for handling both batch and stream processing.

IoT becomes a factor

Streaming analytics is the way IT can match the speed at which the business is moving.
Nelson PetracekCTO of the strategic solutions group, TIBCO Software

As the number of IoT sensors, or endpoints, continues to soar, real-time analysis of the vast volumes of data they generate has become vital. In its February 2016 Worldwide Internet of Things Forecast Update, 2015-2019 report, research firm IDC extrapolated the worldwide installed base of IoT endpoints will hit 30 billion in 2020, a CAGR of 21.4%.

One believer is SilverHook Powerboats, a manufacturer of world-class racing boats. With its boats generating telemetry data from more than 80 sensors at 100 times per second each, analysis of the performance metrics stream in real time, using IBM Streams technology, allows a boat's on-board equipment to be adjusted instantly during a race. "The result is faster insights," said Nigel Hook, co-founder and CEO of the San Diego-based company.

Clearly, as both the velocity and volume of data continue to rise, streaming analytics technology has no difficulty keeping pace -- so far. "Streaming analytics is the way to handle complicated distributed problems at scale," Petracek said. "If you waste time, even milliseconds, in today's digital world, you won't be competitive. Streaming analytics is the way IT can match the speed at which the business is moving."

Next Steps

Streaming analytics poised to be a lifesaver in detecting medical problems

Use stream processing to digest torrential flows of big data

Options for processing big data with real-time analytics

How to develop a CEP architecture for IoT data

Dig Deeper on Cloud application development