Essential Guide

Browse Sections


This content is part of the Essential Guide: A Google cloud services guide for the enterprise
Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Words to go: Google big data analytics

Navigating the range of big data services on Google Cloud Platform can be a challenge. Here is a list that breaks down what Google offers in the realm of big data analysis.

Big data services are becoming more popular due to emerging trends such as the internet of things. Big data can reveal critical information that helps businesses understand customers, optimize processes and strengthen security. Numerous industries can benefit from big data services, but analyzing such voluminous amounts of data is no easy feat. Organizations are turning to big cloud providers for help.

Using Google big data analytics is a popular choice because of the cloud provider's background in search. Google big data services are built upon the Google Cloud Platform, which offers a range of services, including compute, storage and databases, networking and machine learning, as well as tools for management, development and security. When using Google big data analytics, organizations can tap into other Google cloud services with minimal integration work.

Use this glossary of big data terms to navigate your way around Google big data analytics services in the cloud:

Google BigQuery: BigQuery is a data analysis web service that processes and analyzes large data sets using SQL queries. Users can pull data into BiqQuery from Google Cloud Storage or Google Cloud Datastore, or stream data to enable real-time analysis. BigQuery supports geographic replication so customers can choose where their data is stored globally and ensure availability. Additional integrations are available from Google Cloud Platform partners and third parties to process and visualize data.

Google Cloud Dataflow: Cloud Dataflow is a managed service that executes various data processing patterns, such as ETL, batch and streaming. Users can build a pipeline to manage and analyze data in the cloud, while the Dataflow service automatically manages resources. Cloud Dataflow was built to integrate with other Google cloud services, including Google Cloud Storage, Google Cloud Pub/Sub and Google BigQuery.

Google Cloud Dataproc: To process big data sets and simplify data analysis, Cloud Dataproc offers managed versions of Apache Hadoop and Spark open source technologies. Users can quickly spin up Hadoop or Spark clusters and resize them at any time without comprising data pipelines through automation and orchestration. Since Cloud Dataproc is part of the Google Cloud Platform, it can be fully integrated with other Google big data analytics services, such as BigQuery, Cloud Storage and Cloud Bigtable.

Google Cloud Datalab: Currently in beta, Cloud Datalab is a large scale-data tool that is built on Jupyter and runs on Google App Engine. It is used to discover, visualize and analyze data from BigQuery, Google Compute Engine and Google Cloud Storage, and supports Python, SQL and JavaScript. To visualize data, organizations can use Google Charts or matplotlib. Due to its open source design, developers are able to extend Cloud Datalab through the GitHub.

Google Cloud Pub/Sub: Cloud Pub/Sub is a messaging tool that provides high availability and security for communication between applications. Developers can use the service to integrate systems hosted on or off the Google Cloud Platform. Cloud Pub/Sub is commonly used to balance workloads in network clusters, distribute event notifications and log into multiple systems.

Google Cloud Datastore: Cloud Datastore is a NoSQL database for nonrelational data such as web and mobile applications. The database is highly scalable to handle varying workloads and automates sharding and replication. Its RESTful interface allows it to act as an easy integration point.

Google Cloud Bigtable: Cloud Bigtable is a fully managed NoSQL big data database service designed to handle massive workloads while maintaining high performance. It is used to power core Google services such as Search, Analytics, Maps and Gmail. Cloud Bigtable uses a low-latency storage stack and is globally available. By supporting an open source API, Cloud Bigtable, which is currently in beta, makes applications portable between its service and HBase.

Google Genomics: Geared toward the science community, Genomics organizes genomic data and allows researchers to process, analyze and store complex data sets. The service can scale to accommodate petabytes of genomic data and, because it supports the Global Alliance for Genomics Health open standard, users can share data with other members of the scientific community.

Next Steps

Manage big data in the cloud

Form a big data strategy for cloud

Compare the big three big data services

This was last published in June 2016

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Which Google big data services do you use and why?