BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Big data services are becoming more popular due to emerging trends, such as IoT. Big data can reveal critical information that helps businesses understand customers, optimize processes and strengthen security. Numerous industries can benefit from big data services, but analyzing such large amounts of data is no easy feat. Organizations are turning to big cloud providers for assistance.
Google's background in search gives its big data services a leg up against competitors. Its offerings are built on Google Cloud Platform (GCP), which offers a range of services, including compute, storage, databases, networking and machine learning, as well as tools for management, development and security. When using Google big data services, organizations can tap into other Google products with minimal integration work.
Use this glossary of products to navigate your way around Google big data services in the cloud.
BigQuery is a data warehouse that processes and analyzes large data sets using SQL queries. These services can capture and examine streaming data for real-time analytics. It stores data with Google's Capacitor columnar data format, and users can load data via streaming or batch loads. To load, export, query and copy data, use the classic web UI, the web UI in the GCP Console, the bq command-line tool or client libraries. Since BigQuery is a serverless offering, enterprises only pay for the storage and compute they consume.
Google Cloud Dataflow
Cloud Dataflow is a serverless stream and batch processing service. Users can build a pipeline to manage and analyze data in the cloud, while Cloud Dataflow automatically manages the resources. It was built to integrate with other Google services, including BiqQuery and Cloud Machine Learning, as well as third-party products, such as Apache Spark and Apache Beam.
Google Cloud Dataproc
Cloud Dataproc is a managed Apache Hadoop and Spark service for batch processing, querying, streaming and machine learning. Users can quickly spin up Hadoop or Spark clusters and resize them at any time without compromising data pipelines through automation and orchestration. It can be fully integrated with other Google big data services, such as BigQuery and Bigtable, as well as Stackdriver Logging and Monitoring.
Google Cloud Pub/Sub
Cloud Pub/Sub is an asynchronous messaging service. It manages communication among different applications, and it serves as a foundational component for stream analytics pipelines. It supports implicit invocation in which the publisher has little control over the process except to guarantee the message's delivery to the subscriber. Typically, enterprises use Cloud Pub/Sub for general event data ingestion and distribution patterns. Developers can use Cloud Pub/Sub to quickly integrate systems hosted on or off GCP.
Google Cloud Data Fusion
Cloud Data Fusion is a data integration service used to build and manage extract, transform and load data pipelines. The point-and-click visual interface makes pipeline development code-free and enables users of all skill levels to prepare, transfer and transform data. Data Fusion's open source foundation enables more portability for hybrid and multi-cloud integrations.
Google Cloud Composer
Cloud Composer is an orchestration tool that helps create, manage and monitor workflows across clouds and on-prem systems. It's built upon the open source Apache Airflow project, which gives enterprises more flexibility to avoid lock-in. This tool can work in concert with other Google big data services.
Google Cloud Data Catalog
Cloud Data Catalog is a data discovery service that enables enterprises to capture technical and business metadata from schematized tags and build a comprehensive catalog to easily locate data assets. To protect the data, it uses access-level controls and integrates with Google Cloud Data Loss Prevention to classify sensitive information.
Google Data Studio
Data Studio offers interactive dashboards to build visual representations of data. Users can analyze data from a variety of sources, share reports and collaborate in real time.
Google Cloud Data Transfer
Cloud Data Transfer moves small and large amounts data -- physically and virtually -- to Cloud Storage, BigQuery and Cloud Dataproc. It offers four approaches: Online Transfer, Cloud Storage Transfer Service, Transfer Appliance and BigQuery Data Transfer Service. Transfer times depend on amount of data, network connection and whether the data is moved physically or online.
Google Cloud Bigtable
Cloud Bigtable is a managed NoSQL database service designed to handle massive workloads while maintaining high performance. It is used to power core Google services, such as Search, Analytics, Maps and Gmail. Cloud Bigtable uses a low-latency storage stack and is globally available. It supports the open source HBase API, which makes applications more portable between the databases. It is commonly used for time-series, marketing, financial, IoT and graph data.
Google Cloud Datalab
Manage big data in the cloud
Form a big data strategy for cloud
Compare the big three big data services