With all the cloud database services available, it can be difficult to know which to choose. First, it's important to understand the differences between specific services, such as data warehouses and data lakes, and whether they work best for structured or unstructured data sets.
Data warehouses and databases store structured data, while a data lake is specifically designed for unstructured data. Data in structured systems is organized according to a schema, known as a schema-on-write process, while unstructured data is stored raw and only subject to a schema structure when read for analysis, a process known as schema-on-read. This means that structured stores only handle data that fits the defined schemas, while an unstructured store handles other data types.
Structured cloud database services are very disciplined and rigid in terms of schema changes. Unstructured data stores, on the other hand, are more agile and undisciplined, meaning they can handle data formats that vary widely. This is a major factor when an enterprise chooses one service over another, as there are aspects of unstructured data stores that preclude them from some use cases. Industries like healthcare, which have strict compliance requirements, need tight security and are best suited to structured environments.
Ultimately, the amount of agility you require for a given set of data elements determines the best approach. The unstructured approach is typically best for high-agility data sets, such as marketing inputs related to customers' near-real-time buying interests or financial data that uses machine learning to pinpoint the most relevant information for the user. Other uses for unstructured systems include scientific data storage where there might be structure in data patterns, but not individual data points.
Media-related computing can also benefit from unstructured storage. It's easier, for instance, to store media objects in a data lake than as referenced files in a structured database because of the variety of media types and the high number and large size of media files.
Review public cloud database services
Within the leading public cloud platforms, there is a rich, yet sometimes confusing, set of data storage options. Amazon Web Services (AWS) has structured database services, as well as a set of unstructured data services aimed at big data stored in its Simple Storage Service (S3). Amazon Redshift, which uses S3 for standard data storage at $0.02 per GB per month, provides data warehousing. AWS also has an interactive query service, called Athena, and data streaming and real-time analysis tools as part of its Kinesis analytics suite.
Google focuses heavily on big data and analytics with data warehouses, such as BigQuery and Cloud Bigtable, as well as dataflow services and Google Genomics. It has four cloud database services, including MySQL and NoSQL, and a mission-critical relational database option, Cloud Spanner.
Google BigQuery data storage costs $0.02 per GB per month and is prorated on a per-MB, per-second basis. This can save costs for transient data, such as retailing feeds that vary in volume. BigQuery pricing automatically drops to $0.01 per GB per month after 90 days.
Azure has several cloud database services and a data warehouse. In addition, Azure Table storage and Azure Redis Cache, a high-performance key/data store, serve big data. Microsoft also offers the Azure Data Lake Store service, priced at $0.039 per GB per month for pay as you go, with capacity-based discounts up to 33% for monthly commitments.
In all cases, there are data usage charges, so total pricing depends on your use patterns.
Oracle and SAP also offer cloud database services on their own platforms, which, coupled with the options above, means users face a lot of choices. In the end, your experience with structured databases and warehouses will likely influence your decision there, while you'll face a lot more flexibility in the data lake decision.
Compare the top big data services in cloud
Understand your cloud storage costs
Get to know cloud-based data warehouse options