Essential Guide

A Google cloud services guide for the enterprise

A comprehensive collection of articles, videos and more, hand-picked by our editors
News Stay informed about the latest enterprise technology news and product updates.

Google Cloud Platform's big data ecosystem expands

Google customers looking to do more around high-performance workloads got a boost with the expansion of the cloud vendor's big data ecosystem.

Google continues to flesh out its big data cloud services with its Dataflow project and new partnerships around...

Hadoop.

The latest open source expansion of the ecosystem involves a deal with Cloudera Inc. to run Google Cloud Dataflow on Apache Spark and the addition of Hortonworks HDP 2.2 to Google Cloud Platform. Both are seen as moves that make it easier for customers to carry out high-performance activities in a more controlled environment without having to piece everything together themselves.

"The agreements made by Google here should provide far easier means of installing and setting up full platforms for dealing with specific needs of different types of data systems," said Clive Longbottom, service director at analyst firm Quocirca, based in Newbury, England.

Dataflow, currently in alpha and a potential competitor to Amazon's Elastic MapReduce, is seen as a means to execute batch or streaming data pipelines. It incorporates open sourced SDKs for programming large-scale data processing and managed services that tie together various Google cloud products for executing those big data projects.

Dataflow was previously only available via "runners" on local machines or through Google's managed cloud environment. The Cloudera partnership allows customers to use Spark runners either on-premises or in the cloud and is available on Github as part of Cloudera Labs.

Dataflow is something that has drawn the interest of Google cloud customer Workiva, said Dave Tucker, senior director of platform development for the financial reporting software developer based in Ames, Iowa.

"It potentially helps us solve more of the problems we're dealing with around large amounts of data and trying to sync a lot of different processes we have," Tucker said.

Cloudera has already integrated Amazon Web Services and Microsoft Azure. Google is trying to do the same things as Amazon by adding cloud services that customers don't have to build themselves and by showing a strong commitment to portability, said Josh Wills, Cloudera founder and CTO.

"I love the idea of a company publishing cloud services with a proprietary engine, but ensuring customers could take code with them and weren't locked in," Wills said.

Dataflow can be good for unstructured data, such as geospatial data with large amounts of map information, and in fields with complex file formations, such as genomics and bioinformatics, Wills said. If Google is able to succeed with Dataflow and merge batch and real-time data it will be a "life-changer," he added.

The deal with Cloudera is a smart one, and it ensures Google maintains a balanced relationship with all three major Hadoop distributions, said Carl Olofson, research vice president for IDC, based in Framingham, Mass.

"The plan to run Dataflow on Spark with Cloudera support makes a ton of sense, enabling Google to add value to their Google Cloud Platform by enabling rapid data loading into Hadoop," Olofson said.

Overall, it looks like Google is developing "function as a service" capabilities that fit somewhere between software as a service and platform as a service, Longbottom said. It's a move in the right direction, but more must be done around messaging, he added.

"Use cases, case studies and guidance for less technical people in what this means to them and their businesses would be a welcome move," Longbottom said.

Trevor Jones is the news writer for SearchCloudComputing. You can reach him at tjones@techtarget.com.

Next Steps

Geospatial data, meet Hadoop

Google big data terms you need to know

DevOps necessary to navigate big data ecosystem for data engineering

 

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Are you interested in using Google Cloud Dataflow for your big-data workloads?
Cancel
The idea of using Google Cloud Dataflow for my business is enticing at first, but upon further investigation, becomes less of an alternative for my company. Google has achieved near mythic status in the tech world and is used globally, but Google also has  few systems in place to prevent unwanted intrusions to stored data. While my industry calls for large data storage, we are not interested in using Google Cloud Dataflow at this time.
Cancel
Creating preconfigured systems to help with the flow of large amounts of data from on premise databases will help Google overcome the advantages of AWS.
Cancel

-ADS BY GOOGLE

SearchServerVirtualization

SearchVMware

SearchVirtualDesktop

SearchAWS

SearchDataCenter

SearchWindowsServer

SearchCRM

Close