This content is part of the Essential Guide: AWS vs. Google comparison guide

Google trumps MapReduce with new big data service

Google raised the stakes in big data cloud services for enterprise IT shops this week with Cloud Dataflow -- a successor to MapReduce.

Google raised the stakes for big data in the cloud with a new service targeted at the enterprise market and intended to succeed MapReduce.

The company last week previewed Google Cloud Dataflow, a managed service for creating data pipelines to help collect and analyze data that is streamed or in batches. The service will allow customers to monitor and gain insights from data without having to worry about the underlying infrastructure, according to Google.

This is a strong challenge to Amazon Kinesis and other Hadoop-based big data services in the cloud, and Cloud Dataflow is closer to big data as a platform, according to Brian Hopkins, a big data analyst with Cambridge, Massachusetts-based Forrester Research Inc.

"It's a step, but it's a big step," Hopkins said. "When Google decides to do something it's like an 800-pound gorilla -- they move pretty quickly."

Amazon offers the least-common denominator with its infrastructure services; it allows users to build their own big data services. IBM has its own set of big data and other services available in beta through BlueMix, but it's unclear how all those tools work together, Hopkins said.

Google outdoes big data cloud services

Google's other big data service, BigQuery, may work for a small digital startup, but large, heavily regulated enterprises don't want everything on the public cloud, Hopkins said. Because hybrid is becoming the dominant cloud model, Cloud Dataflow allows the flexibility to compare large data sets in real time, whether the information is on-premises or in the public cloud.

It's a step, but it's a big step. When Google decides to do something it's like an 800 lbs. gorilla -- they move pretty quickly.

Brian Hopkins,
analyst, Forrester Research

A decade ago Google was central to the creation of MapReduce, but Dataflow is tantamount to the company questioning MapReduce's ability to adequately operate with massive data sets, according to David Linthicum, senior vice president at Boston-based Cloud Technology Partners. Instead of rewrites and other augments people used to get around problems in MapReduce, Dataflow simplifies processes using pipelines.

"The ability to combine big data and streaming information becomes a key tool a lot of people were looking for, and I think Google is doing some smart stuff in providing that kind of architecture so they can get it in the hands of the developers," Linthicum said.

Dataflow makes it easier to stream information in a way that allows enterprises to make real business decisions based on the analytics that comes from the data, according to Larry Carvalho, an analyst with Framingham, Massachusetts-based IDC.

If Google can integrate these new services with its full range of products and attain data from entrenched end users, it could provide a true differentiator and shrink the gap with market leader Amazon Web Services (AWS), Carvalho said.

"The rate they're reaching the market, they're making very rapid progress," Carvalho said. "They still have a lot of work to do, but they're moving in that direction."

In January, Google paid $3.2 billion for Nest Labs Inc., a maker of smart smoke alarms and thermostats for homes. That move makes more sense with Dataflow, as the company tries to leverage the capabilities of high-volume, high-speed data.

"The driver for this is going to be the next avalanche of data around wearables and the Internet of Things and the potential to use that data to win and retain customers," Hopkins said.

Cloud Dataflow is based off internal technologies like Flume and MillWheel, Google said. The big data cloud service is in Google's private beta program. The company did not have pricing details or generally availability information to share at publishing time. 

Google beefs up Cloud Platform

Alongside Cloud Dataflow, several other Google Cloud Platform features were introduced to help monitor and debug production issues. Google Cloud Monitoring uses technology from the recently acquired Stackdriver. Cloud Trace finds performance bottlenecks, and Cloud Debugger identifies problems in application production, Google said.

Those other features, along with claims of improvements for mobile applications, are a sign of Google chasing cloud developers as much, if not more, than AWS, Linthicum said.

While non-developers' eyes might glaze over at the latest features, they're fundamentally important to building solutions in the cloud, Linthicum said. Google appears to be moving toward making its platform as easy for developers as possible, and it could be a big win for the company in the long-term.

"It's smart," Linthicum said. "The tech people who are building the applications are going to make a core decision as to what platforms to leverage."

Trevor Jones is the news writer for SearchCloudComputing. You can reach him at

Dig Deeper on Cloud automation and orchestration

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Would you use a managed service for big data?
I won't even consider managed services unless they offer personalized solutions and support, not just generic hosting. Big data by itself isn't valuable - its worth only comes when you can use it to get the answers to your questions, and that often involves writing new software as needs arise. A management company needs to help me produce results (or at least make it easy to bring in others who can) before it gets my business.