Cloud storage is a viable option for a range of storage requirements. Understanding the key features of various cloud storage systems can help identify appropriate use cases and avoid potentially expensive mistakes.
We use the term "cloud storage" as if there were a single data storage service. There are, in fact, multiple types of cloud storage systems. It is helpful to categorize these systems by characteristics that determine appropriate use cases. For example, you would not want to run your inventory management system using an archival system that takes hours to respond to a read request. Similarly, there is no reason to pay for low-latency SSD storage when disk-based storage is sufficient.
Laying out the types of cloud storage systems
For the purpose of this review, storage systems are categorized according to four common technical features:
- File versus object stores
- Size limitations
While these are the primary features, they're not the only ones you may have to consider when selecting a cloud storage system. There are also business-focused features -- cost and latency -- that should be taken into consideration. Availability of encryption and access controls, for example, may be key considerations for your selection criteria. These security features can be found on different types of storage systems and are independent of the way we will categorize storage systems.
Reliability. Reliability is a measure of the likelihood that the storage system will be accessible and functioning over an extended period of time.
A storage system with 99.99% reliability is expected to be down for about 4.5 minutes per month. A storage system with 99.999% reliability is expected to be down for less than 30 seconds per month.
A related metric -- durability -- is a measure of the likelihood that a piece of data will be lost. Amazon Simple Storage Service (Amazon S3) uses multiple copies of data to guarantee 99.999999999% durability. This means you can expect to lose 0.000000001% of your stored objects each year.
It should be noted that these estimates are based on assumptions that underestimate the potential for systemic failure. For example, if copies of your data are stored in three different data centers, the probability of all three failing at the same time is quite small if failures occur randomly. If, however, there is a manufacturing or design flaw in the disk drives and all three copies are stored on the same type of drive, then the chance of simultaneous loss would be higher than implied.
Accessibility. Accessibility determines how a storage device is used. Object data stores, including Amazon S3 and Windows Azure Blob storage, allow you to store data as Web addressable objects. Programs or interactive users can retrieve objects using REST interfaces. Files can be stored in object repositories, but these systems do not have the full range of features and options found in file systems. This type of storage is especially useful if multiple servers may need simultaneous access to data.
Object storage systems usually maintain multiple copies of data to improve availability and durability. Changes to objects are eventually consistent across all copies, though there is the possibility that two users could see different results -- if their queries were processed by services using different copies of the object.
Block storage devices, such as Amazon Web Services' (AWS) Elastic Block Storage (EBS), are more like direct- or network-attached storage systems than object stores. Block devices are appropriate for applications that need low-level disk access, such as raw formatted drives, a standard file system or consistent I/O performance.
As with physical storage devices, block storage devices are attached to one server at a time. This works well with databases and other applications designed to work with Linux or Windows file systems. In the case of Amazon EBS, you also have the option of creating EBS volumes with provisioned I/O performance. There is an additional charge for provisioned I/Os, but they allow you to tailor your I/O performance to your service requirements.
In addition, EBS volumes can be replicated and replicas can be attached to other servers allowing for higher levels of read performance. Read replicas are not appropriate for applications with high volumes of write operations over extended periods of time.
Size limitations. Storage limit is another factor to consider with block storage devices. Unlike object stores, which provide virtually unlimited storage, EBS volumes are limited to 1 TB. Archival storage -- a third type of cloud storage option -- offers low-cost storage, e.g., $0.01/GB per month in Amazon Glacier, but with extremely high latency. Archival storage is appropriate in situations where data must be retained even though most of it is unlikely to be used.
About the author:
Dan Sullivan holds a Master of Science degree and is an author, systems architect and consultant with more than 20 years of IT experience. He has had engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail and education. Dan has written extensively about topics that range from data warehousing, cloud computing and advanced analytics to security management, collaboration and text mining.