Modern Infrastructure Editor-in-Chief
Published: 20 Jan 2016
There's cloud storage, there's high-performance storage, but is there really such a thing as high-performance cloud storage?
For a long time, the answer was no.
"Any time you move your infrastructure somewhere outside of your data center, there's going to be latency involved, and you run in to the speed of light problem," said Scott Sinclair, analyst with Enterprise Strategy Group in Milford, Mass. "The speed of light can only go so fast."
Those that required high-performance storage out of their cloud providers either learned to compromise, or stayed home. Increasingly though, there are emerging technological approaches that suggest that you can have your cloud storage cake and eat it too – that is, it's possible to run IO-intensive, latency-sensitive applications with some level of cloud-based infrastructure.
High-performance cloud storage could allow organizations to run demanding database applications in the cloud that have been stymied by cloud storage's limitations. It could also allow you to keep applications on-premises, but take advantage of cheap and scalable cloud storage over the wide area network. And finally, it could make it possible to run compute in the cloud that accesses storage infrastructure back in the private data center.
But unlike most storage problems, the trick to achieving high-performance cloud storage isn't just to throw more disk drives or flash at the problem, Sinclair said. When solving for the speed of light, new technologies "need to rely on a specific innovation to solve the problem," Sinclair said -- namely, colocating data very close to compute, or introducing some sort of network optimization or caching mechanism. Some solutions combine all three of these approaches. And while it's still early days, early adopters have seen promising returns.
On-prem compute, cloud storage
"We used to have the mindset that storage is cheap, and if you need more storage, just go buy some more," said David Scarpello, COO at Sentinel Benefits & Financial Group, a benefits management firm in Wakefield, Mass. "Then I came to the realization that storage is not cheap, and whoever told me that was hugely mistaken."
Between purchasing extra capacity, support and maintenance, staff, backup, maintaining a data center and disaster recovery site, Sentinel pays upwards of $250,000 per year to maintain 40 TB worth of on-premises storage – over $6,000 per TB. "It's a lot," he said – and for what?
"Storage is important – it keeps us safe -- but it's not something that you want to be spending a lot of money on."
Meanwhile, public cloud providers offer raw capacity at rates that rival consumer hard disk drives. Prices for Amazon Web Services (AWS) Simple Storage Service (S3) start at $0.03 per GB per month -- less for greater capacities and infrequent access tiers -- or $240 per year for a managed, replicated TB.
But that cheap capacity tier is based on object storage, whose performance is adequate in the best of times -- and downright slow when accessed over the wide area network. So the challenge for many IT organizations is how to tap into the cloud's scalability and low cost, while maintaining a modicum of performance.
For Sentinel, one potential fix is a data caching and acceleration tool from Boston-based startup ClearSky Data that combines an on-premises caching appliance and a sister appliance located in a local point of presence (POP) that is directly connected to high-capacity public cloud storage. By caching hot data locally and accessing the cloud over a dedicated, low-latency connection, customers take advantage of cheap cloud-based storage for on-premises compute without a performance hit.
In an initial release, ClearSky promises near local IOPS and latencies of under two milliseconds for customers out of its Boston, Philadelphia and Las Vegas POPs. The plan is to increase its geographic presence, and add support for additional cloud storage providers, said ClearSky Data co-founder and CEO Ellen Rubin.
Sentinel has begun to move about 7 TB of test and development volumes to AWS via ClearSky, with no complaints from developers. Ideally, the company will slowly move over all its data, thereby eliminating a $5,000 per month maintenance fee to NetApp, as well as the need for backups and offsite disaster recovery.
Cloud compute, and storage, too
If you're running a latency-sensitive database application in the cloud, best practices dictate that you go with the cloud provider's block storage offering, such as AWS Elastic Block Storage (EBS). That used to be a death-knell for large database workloads that became stymied by limited IOPS and smaller volume sizes.
When Realty Data Company's parent company National Real Estate went bankrupt in 2012, it had to make some quick decisions concerning its three data centers: go into another data center, rent colocation space or go to the cloud.
"As much as it's hard to let go, going to the cloud made the most sense, financially," said Craig Loop, director of technology at the Naperville, Ill., firm.
At first, Realty Data scrambled to do lift-and-shift migrations of its applications, but stumbled to migrate its 40-TB image database off of an EMC array and in to the cloud. Latency and performance numbers from S3 were unacceptable, and meant rewriting its in-house application to support object storage.
"Even with shims, we couldn't get it to work," Loop said. Meanwhile, AWS EBS wasn't a real option either, because at the time, EBS supported volume sizes of only 1 TB. "EBS would have been a management headache," Loop said.
Working with cloud consultancy RightBrain Networks, Realty Data used a Zadara Virtual Private Storage Array (VPSA), dedicated single-tenant storage adjacent to the cloud data center and connected via a fibre link, and purchased using a pay-as-you-go model. The Zadara VPSA presents familiar SAN and NAS interfaces, and storage performance developers expected with an on-premises EMC array. Zadara has since added VPSAs at other cloud providers, as well as an on-premises version that provides cloud-like pay-as-you-go consumption.
Native cloud block storage options have also upped their game. AWS EBS, for instance, now supports volume sizes of up to 16 TB, and EBS Provisioned IOPS Volumes backed by solid state drives deliver up to 20,000 IOPS per volume. Still, while that's good enough for a lot of database workloads, it isn't for all of them.
Lawter Inc., a specialty chemicals company based in Chicago, Ill., recently moved its SAP and SharePoint infrastructure to a public cloud service from Dimension Data, and chose Zadara VPSA because it needed to guarantee a minimum of 20,000 IOPS for its SAP environment. "[Dimension Data's] standard storage could not meet our IOPS requirements," said Antony Poppe, global network and virtualization manager with the firm.
Meanwhile, traditional storage vendors see a market for their wares at cloud service providers. Not only do some cloud block storage offerings fail to deliver sufficient IOPS and latency, many cloud users report suffering from "IOPS competition" – competing for IOPS resources with other tenants of the environment, said Varun Chhabra, EMC director of product marketing for its Elastic Cloud Storage.
Pairing cloud compute with dedicated storage can achieve predictable performance.
At the same time, using dedicated storage for cloud-based workloads is reassuring to some businesses, said Catherine Van Aken, lead for business development, channels and partners at Virdata, which develops a big data and analytics platform for Internet of Things (IoT) applications, and whose platform is based on OpenStack running on NetApp FlexPod converged infrastructure.
"Not all customers are ready for the public cloud," Van Aken said. "The market is growing from the edge, but will move to the cloud over time," she said, citing an IDC prediction that within five years, more than 90% of IoT data will be hosted in the cloud. With its approach, Virdata can offer its customers a stepped approach to going from an all on-prem environment to compute in the cloud -- with storage nearby.
Further, using traditional storage in the cloud offers management familiarity, said Phil Brotherton, NetApp vice president of the Data Fabric group. It even appeals to compliance officers, he said, "by holding data out of the cloud, even if the compute is in." NetApp has hundreds of customers for its NetApp Private Server, which delivers fast, low-latency performance "near the cloud" at providers including AWS, Microsoft Azure, IBM SoftLayer and AliBaba Group, Brotherton said.
Cloud compute, on-prem storage
But for some organizations, any storage in the cloud is too much storage in the cloud. The volume of data is too great, the investments in on-prem storage infrastructure are too large, or the regulations governing their actions are too stringent to seriously contemplate putting data in the public cloud.
Compute, however, is another story. There are lots of scenarios when an organization may want to run an application in the cloud, but keep its data at home, said Issy Ben-Shaul, CEO of Velostrata, a startup whose software decouples storage from compute. They may want to use cloud compute for application modernization, for test and dev, or to accommodate utilization spikes. Meanwhile, keeping data on premises provides investment protection, meets compliance goals, or avoids massive data migration efforts. It can also lay the foundation for a multi-cloud strategy, moving applications between clouds to avoid cloud lock-in, without having to make changes to their data stores.
"Decoupling compute and storage has a lot of implications," Ben-Shaul said.
In addition to severing the connections between storage and compute, the Velostrata software streams and caches application images to the cloud from on-premises storage. It consists of two VMs – one running in VMware vCenter that mediates access to on-premises storage for reads and writes, and one in the cloud that communicates with the running compute processes, and integrates with monitoring engines. "The whole idea is to be cloud-agnostic, and allow VMs to run natively in the target cloud environment," Ben-Shaul said.
Enterprise Strategy Group's Sinclair anticipates that the storage community will continue to put forth creative solutions to deliver high-performance cloud storage. According to its research, using off-premises cloud resources is IT organizations' top initiative for the coming year.
"There's obviously a huge amount of interest, but at the same time, you really have to solve the speed of light challenge."