BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
The cloud market is evolving quickly, with an ever-changing set of big data services. While this makes cloud vendor comparisons difficult, it's worth the attempt, because the offerings from the top three cloud providers -- Amazon Web Services, Microsoft Azure and Google -- aren't created equal.
Big data in the cloud is an area of the market where Google's immense experience in search has synergies, but Amazon Web Services (AWS) and Azure are attracting some interesting startup companies to add value.
The result is a vibrant spectrum of big data services that is increasingly attractive from both a capability and an economic perspective. Cloud users ultimately win from the big data competition between the big three -- and that looks to continue for years to come.
Here's a closer look at the big data services today from AWS vs. Azure vs. Google.
Amazon Web Services
AWS has a broad spectrum of big data services. Amazon Elastic MapReduce, for example, runs Hadoop and Spark while Kinesis Firehose and Kinesis Streams provide a way to stream large data sets into AWS. Users can store data in Redshift, a petabyte-scale data warehouse, with data compression to help reduce costs. Amazon Elasticsearch is a service to deploy the open source Elasticsearch tool in AWS for analytics such as click-through and log monitoring. Kinesis Analytics complements this by analyzing data streams.
AWS has a larger set of data storage choices compared to Google. In addition to the massive AWS Simple Storage Service farm, it has DynamoDB, a low-latency NoSQL database; DynamoDB for Titan, which provides storage for the Titan graph database; Apache HBase, a petabyte-scale NoSQL database; and relational databases.
AWS also has a business intelligence (BI) service, QuickSight, which uses parallel, in-memory processing to achieve high speeds. This is complemented by Amazon Machine Learning and the AWS Internet of Things (IoT) platform, which connects devices to the cloud and can scale to billions of devices and trillions of messages.
While Google has an edge with search and analytics engines, AWS has a broader spectrum of services, as well as BI and graphics processing unit (GPU) instances.
For analytics, Azure has Data Lake Analytics, which uses proprietary U-SQL with SQL and C++, as well as HDInsight, a Hadoop-based service. There is also an Azure Stream Analytics service, a Data Catalog that identifies data assets using a global metadata system, and Data Factory, which interlinks on-premises and cloud data sources and manages data pipelines.
Azure's big data storage service is Data Lake Store, a Hadoop file system. The cloud provider has a broad set of general purpose storage offerings, including StorSimple, SQL and NoSQL databases and storage blobs.
Azure also has Power BI and machine learning, lining up with AWS, and features an IoT Hub. The cloud platform also includes a search engine. Microsoft's Cortana suite and Cognitive Services provide more advanced intelligence capabilities.
Documentary examines benefits of big data
The PBS documentary The Human Face of Big Data, which aired in 2016, sparked a lot of conversation on social media, and it's not hard to see why. The documentary provides a general-interest look at the benefits of big data, and suggests that big data is having a major impact on nearly every industry, including retail, manufacturing and marketing.
Google's BigQuery data service uses a SQL-like interface that is intuitive for most users -- even nontechnical ones -- to learn. It supports petabyte databases and can perform data streaming at 100,000 rows per second as an alternative to running data from cloud storage. BigQuery also supports geographic replication and users can select where they store their data.
BigQuery is a pay-as-you-go service without a dedicated infrastructure of instances, which allows Google to use a large number of processors to maintain fast query times. Integration with Spark, Hadoop, Pig and Hive is also supported. Organizations can also use Google Analytics and DoubleClick -- a tool for the advertising industry that gathers statistics to feed BigQuery -- as data sources. Google Cloud Dataflow allows users to sequence cloud data services.
Other big data services offered by Google include Cloud Datastore, a NoSQL database for nonrelational data; Cloud BigTable, a massively scalable NoSQL database; Cloud Machine Learning, a managed platform for machine learning; and ancillary tools such as translators and speech converters.
One notable offering that Google is lacking for big data is the GPU instance. Writing GPU code for data analytics is a high-value skill, given the incredible performance boosts that GPUs offer. Google's lack of a GPU instance family is somewhat puzzling, especially with AWS having the feature since 2011 and Azure adding it in 2015.
AWS vs. Azure vs. Google: A close race in big data
In many ways, the big three cloud providers are in lockstep on big data services, though there are under-the-hood differences in performance and ease-of-use that require some hands-on testing to discern. While Google likely has an edge in search, it lags behind on the BI front, where Microsoft has an edge with Cortana. Google's lack of GPU instances is also a notable difference.
As with any broad spectrum of products, and because all these big data services are in their relative infancy, there will be differences that are use case- or data-dependent. It can be difficult to choose between AWS vs. Azure vs. Google. One way to determine the best cloud service for you is to try them in a sandbox for a few weeks to get a sense of what works and what the price will be.
Manage your big data in the cloud
Evaluate the relationship between cloud and big data
Form a big data strategy for your cloud