BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
By reducing recovery time in the wake of disasters or downtime, cloud archiving and data backup systems play a crucial role in the cloud. And while Amazon Web Services' Glacier archiving service has emerged as a popular choice for many cloud users, Google Cloud Storage Nearline represents a strong alternative.
When it comes to speed, the Amazon Glacier archiving service appears to fall short in comparison to Google Cloud Storage Nearline. Given its tape-based speed, Glacier's data retrieval takes three to five hours to start and needs a request to AWS to initiate. Nearline, however, uses its standard storage interface, and kicks off the data transfer in just three seconds. This makes a huge difference for disaster recovery in terms of how soon data can be back online, and also makes individual file or directory restoration more attractive and interactive.
Additionally, Nearline's multisite redundant storage offers data integrity and disaster protection, and uses the same interface as other Google storage services.
Google bases retrieval bandwidth on archived capacity. Currently, Nearline offers bandwidth speeds of four megabytes per second for each terabyte stored, and speeds increase as storage capacity grows. This means Amazon Glacier has a substantial speed advantage for small archives. While this may be a concern for small businesses considering Nearline, midsize and larger organizations have larger archives and, therefore, aren't as concerned about bandwidth.
Still, Amazon Glacier is a solid service and secure from a data integrity viewpoint. Glacier and Nearline pricing is roughly the same -- one cent per gigabyte -- but the fine print makes early deletion on Glacier significantly more expensive than Google. However, Google's higher data transfer fees offset the pricing difference.
The enterprise impact of low-cost cloud archiving
So, how do low-cost cloud archiving systems like Nearline and Glacier affect operations? Facebook and its new disk-based archiving system offer an example. Facebook has built two data centers -- with more on the way -- and each has more than an exabyte of capacity. The social media giant doesn't plan to throw away any of the 2 billion images it adds each day. But, since Facebook has to pay to store these images, a low-cost system is crucial.
Facebook has created a very dense storage scheme that puts two petabytes in a rack with two servers handling 480 drives. The interesting twist is that the company uses a power management approach that allows only one out of its 25 storage shells in the rack drive to power on at the same time. As a result, rack power is less than 2 KW, and drive wear is phenomenal.
With capacity savings from erasure coding and data block geo-dispersion, Facebook has a high-integrity system capable of losing a whole data center without losing data availability. Another interesting point is how Facebook handles its primary "hot" storage. Instead of three photo replicas in hot storage, it's possible to have one hot replica and a single erasure-coded copy in cold storage, which saves power and space, while reducing equipment costs.
When recovery is required, Facebook expects to restore large amounts of data rather than individual files. Its disk-based cold storage approach allows for selective recovery, and fast response times mean recovery can be interactive. However, one drawback to taking this approach in normal operations is the power surge that accompanies starting a drive, as well as the extra wear and tear of powering up and down.
To minimize operating costs, it's likely that Google uses some of the same approaches as Facebook. Based on current tape densities, hard drives are similarly priced to tapes. Meanwhile, the infrastructure to support them is cheap in comparison to a robotic tape library. Since the drive capacity is rising faster than tapes, this advantage is sustainable.
Tapes enjoy an edge in power, but using Facebook's MAID approach swings the pendulum in favor of disk drives.
Looking around the industry, it seems that Nearline will put pressure on other cloud providers to offer a comparable service. Other providers will catch up over the next year or so, and one of them will likely leapfrog Google and go to solid-state drives, as prices continue to fall.
About the author:
Jim O'Reilly was Vice President of Engineering at Germane Systems, where he created ruggedized servers and storage for the US submarine fleet. He has also held senior management positions at SGI/Rackable and Verari; was CEO at startups Scalant and CDS; headed operations at PC Brand and Metalithic; and led major divisions of Memorex-Telex and NCR, where his team developed the first SCSI ASIC, now in the Smithsonian. Jim is currently a consultant focused on storage and cloud computing.
Three paths to cloud archiving
Why did Rackspace white-label cloud archiving services?
Cloud storage use cases beyond archiving and backup
Google Nearline storage vs. Amazon Glacier