Cloud-based backup: Best strategies and practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
Editor's note: This article is part two in a two-part series on cloud data backup. You can read part one here.
There is no shortage of ways to copy data -- and that still rings true in the cloud. When navigating your cloud backup options, take a close look at your cloud environment, including your cloud provider's native backup services, your databases and more.
Back in the virtualization heyday, the preferred backup method was to use data protection software that backed up or replicated entire virtual machines from the hypervisor layer, for example, Veeam Backup, or replication software from Zerto.
For IT shops with VMware-based clouds, that approach still works. VIF Education, a global education provider based in Chapel Hill, N.C. runs a mix of on-premises, software as a service (SaaS) and infrastructure as a service (IaaS)-based applications. For its Google Apps and Salesforce environments, VIF relies on Spanning and Veeam Backup for its on-premises development and legacy applications, as well as the cloud-based teacher management platform that runs at a local service provider's vCloud Air platform. But it's not particularly integrated or graceful, said Matt Torcasso, IT manager at the firm, who looks forward to greater integration between the on-prem and cloud backup processes.
"It's a tough thing to navigate -- how to improve data backup in a [hybrid] environment," Torcasso said. "It's a really fragmented market and there are a lot of different options."
VMware vCloud Air providers are a tiny portion of the overall public cloud market, and the proposed Dell-EMC merger has thrown its future up in the air. But what about the vast majority of cloud shops running on Amazon Web Services (AWS), Microsoft Azure and the like?
One approach is really old-school and uses backup software from inside the operating system, like Veritas NetBackup.
"When you go to the cloud, you have to start thinking agents again," said Edward Haletky, principal analyst at The Virtualization Practice . From there, you back up to a nearby data repository, and replicate that data to another cloud to hedge against a cloud-wide outage.
In fact, the emergence of cloud has breathed new life into agent-based backup. Veeam, for instance, has a new version of its product that goes back in time and performs backups from inside the OS, using a traditional agent. Veeam Backup for Linux is "less about on-prem Linux, and more about cloud," said Doug Hazelman, Veeam vice president of product strategy. Coupled with another agent-based product for Windows -- Veeam Endpoint Backup -- the company is developing "a cloud strategy that you will see us build out this year," and feature integrated management capabilities.
Other cloud backup options to consider
Meanwhile, organizations already running in popular cloud platforms such as AWS and its ilk aren't sitting on their hands, waiting for traditional backup vendors to catch up to the cloud era. Instead, they're exploring other cloud backup options.
Today, all major cloud providers offer a "poor man's backup" -- taking a point-in-time snapshot of a block data store that is stored on to lower cost object storage, said Rajeev Chawla, co-founder and CEO at CloudVelox, which makes cloud data migration and recovery software.
Why poor man's backup? Because "everything is manual -- you have to set everything up yourself -- and the point-in-times are crash consistent, not necessarily application consistent," he said. So while it may be possible to recover a single service from a single snapshot, many applications consist of multiple services, and ensuring they can be recovered as a whole requires that data protection be approached in a holistic fashion.
If you're willing to spend extra, cloud providers will take snapshot backups of your databases. AppNeta, a hosted provider of application performance management software, started out in 2010 running on AWS, relying on disk snapshots features for its backup processes. With snapshots, "it's fairly easy to bring up an instance of hourly, daily or weekly snapshots," said Chris Erway, chief architect at the firm.
But the firm increasingly relies on AWS Relational Database Service (RDS), which includes scheduled point in time snapshots. Several years ago, AWS began to push users toward RDS instead of managing databases manually. "They started saying 'Leave the stateful stuff to us -- we'll manage the data and you just work on the logic,'" Erway said. AppNeta went along for the ride, and now relies on "RDS to do its magic backup thing."
AppNeta backs up over 170TB to AWS S3 -- the result of the processing AppNeta does on 7.4 billion events per day, and uses AWS' S3's infrequent access tier -- bridging the gap between the relatively expensive S3 and super cheap but super slow Glacier archival storage.
Backing up distributed databases
Distributed databases are built across multiple nodes for scalability, and are by nature "eventually consistent," said Tarun Thakur, co-founder and CEO of Datos IO, which builds recovery software for big data and cloud applications. But eventual consistency and point-in-time backup don't mix. To solve that problem, Datos creates a cluster-consistent, point-in-time image of a distributed database, allowing enterprises to build applications based on these cloud databases without worrying about the integrity of their data.
Others take a more MacGyver approach. ACI Information Group is a curated content aggregator and heavy user of AWS DynamoDB, AWS' NoSQL data store.
"It's great for performance, but doesn't have built-in backup," said Chris Moyer, vice president of technology at the company. Moyer's solution: call an AWS Lambda function off of event streams that automatically exports data off a given table or region to AWS S3. And for jobs that will take longer than the five minutes allowed under Lambda, Moyer simply triggers a Docker Elastic Compute Service instance. The result? "Real-time backup and verification and versioning," Moyer said.
In a cloud we trust
Beyond taking point-in-time images of data, other cloud backup options include storing a copy of that data offsite. Previously, that meant shipping your backup tapes to an Iron Mountain vault deep in an abandoned salt mine. Today, IT organizations send digital copies of their backups to offsite locations, which may or may not be in the cloud. But what if your application is already in the cloud -- do you need to move it outside the cloud for safety's sake, or does the cloud's inherent resilience make that overkill?
The answer depends on who you ask. Even though he hasn't suffered any "spectacular failures" on AWS, Chris Moyer, vice president of technology at ACI Information Group, a curated content aggregator, satisfies his "extra paranoia" by exporting backup data to a secondary cloud provider such as Rackspace or Google Cloud Platform.
But while multi-cloud backup is definitely one of the many cloud backup options to consider, it isn't in the cards for everyone. "We've contemplated moving the data out of AWS in to another cloud service provider, but AWS charges a fair amount to move out of its cloud, and the bandwidth charges eclipse the cost savings," said AppNeta's Erway. Further, AWS claims that data in S3 is very reliable -- by default it is designed for 99.999999999% durability, corresponding to an average annual expected loss of 0.000000001% of objects. "They swear up and down about how resilient it is," Erway said. "You sort of have to trust them on that. Using cross-region replication plus a reduced redundancy version of S3 are also options, but the cost is constantly an issue."
Generally speaking, demand for protecting data in multiple clouds is low, said CloudVelox's Chawla, and for most shops, leveraging a single cloud's different regions and tiers of storage service is sufficient. "It's not so much the technology -- we can replicate across clouds -- it's more about the business case," he said. In a multi-cloud environment, "you have two sets of vendors, two sets of contracts," and if you used one cloud's native capabilities, you may not be able to use them in the other. "Not all clouds are created equal at this time," he said.
And it's not like the early days, when cloud storage provider Nirvanix suddenly shuttered its doors and gave customers two weeks to get their data off its site. For all the chills that that sent down IT's spine, today's tier-one cloud providers aren't going to go out of business, Chawla said.
But what about lock-in? Fear that a cloud provider will go down isn't the only reason to avoid lock-in. There's also the prospect that they will dramatically raise prices.
So far, that hasn't happened, said Damian Roskill, chief marketing officer at AppNeta. "Unlike IBM, which achieved lock-in with customers and increased prices, AWS achieves lock-in with customers and drops prices," Roskill said. Further, the margins that Amazon makes on AWS indicate that they can continue to keep lowering prices for the foreseeable future. And for your data's sake, let's all hope that he's right.
Using standard backup apps for cloud data
Five misconceptions about cloud-to-cloud backup
Backing up data on Google Cloud Platform