This content is part of the Essential Guide: Expect the unexpected with a solid cloud DR strategy

Planning and testing cloud-based disaster recovery

Cloud-based disaster recovery can be less expensive and faster than traditional DR -- and it could protect your enterprise when all hell breaks loose.

Companies stand to lose tens of thousands of dollars for each hour of downtime caused by some type of disaster. Traditionally, disaster recovery for physical servers is slow and relatively expensive. Moving DR to the cloud, however, can provide fast recovery of virtual servers at a fraction of the cost of traditional DR. But to recover quickly without breaking the bank, cloud managers must have control of both the production site and the DR site.

Data recovery depends on your recovery time objective (RTO), or how quickly you need to recover from an outage and how much money you’re willing to spend to do it. At the slow-but-cheap end of the spectrum, you can use offsite recovery with an RTO of five to seven days.

Moving DR to the cloud can provide fast recovery of virtual servers at a fraction of the cost of traditional DR.

Hot-site recovery is at the other end of the spectrum and has an RTO of minutes. This method involves SAN-to-SAN replication in which data is always replicated between the production site and the disaster recovery site. Needless to say, this DR method is very expensive.

There are middle-of-the-ground methods of disaster recovery. Cold-site recovery, for example, means servers are available at the DR site but not yet loaded from your backup on the production site. Warm-site recovery features servers that are set up and waiting for admins to move databases before bringing them up to begin recovery.

If you are backing up a physical server without SANs or SAN-to-SAN synchronization between a production site and the DR site, then traditional bare-metal recovery (BMR), presents a number of challenges:

  • You need to use a physical server on the DR site that has exactly the same configuration, BIOS, drivers, etc., as the physical server you are trying to recover at the production site. This is difficult to do unless you buy both servers at exactly the same time.
  • You cannot find a physical server on the DR site with exactly the same configuration you need, so you must select another server, load and patch the OS, load and patch the applications, load data and then configure the system. The system will be restored, if all steps are followed correctly.
  • You must also configure your network to ensure it matches the production site’s network so all virtual private networks (VPNs) and VLANs are configured similarly and firewall rules are the same. Network configuration can really slow down IT infrastructure recovery times.

The shift to cloud-based DR
Cloud-based DR not only speeds recovery compared to traditional physical server recovery, it also allows you to send apps and associated data offsite for recovery at a later time.

But recovery can take a long time when money is an issue, when you can’t afford SAN-to-SAN synchronization or when you don’t have an entire physical infrastructure in place to recover servers. This means you need to have your network completely replicated with change management between the production site and the DR site. This guarantees that anything you do on the production site is replicated on the DR site. When you add a physical server to your production site, you need to make sure your change management process replicates a physical server on the DR site.

With cloud computing, as soon as you capture a physical server via a virtual server using a hypervisor, the virtual server essentially becomes a file (i.e., a VMDF file with VMware vSphere ESXi). Therefore, instead of sending data and apps to a traditional offsite backup and going through the lengthy recovery, you ship the virtual server file to the DR site for backup every few hours. When you need to recover data, turn the virtual server on and make sure the network is properly configured. And recovery is complete.

Virtual server files can be sent to the DR site periodically -- every four to six hours -- or more frequently. Tools such as the VEEAM Backup & Replication tool, which is designed to work with VMware-based cloud environments, can facilitate the process.

This cloud-based DR process works -- if you have control of the production site and the DR site. For example, if the production site uses Microsoft Hyper-V to virtualize servers and the DR site is expecting VMware vSphere ESXi-based virtual servers, then the entire DR process won’t work. You need to find a cloud-based DR services provider that can manage Hyper-V virtual server files.

A good way, and perhaps the simplest way, to approach disaster recovery in the cloud  is to use a hosting provider that handles multi-tenant cloud servers and also has a cloud-based DR service. A hosting service provider would have control of both the production site and the DR site.

Not everyone sees the benefits of cloud-based disaster recovery. One complaint is enterprises don’t really get full, traditional DR when it’s managed in the cloud. Critics claim those enterprises actually get Backup as a Service (BaaS). Arguably, you only get cloud-based DR when a single organization is in control of both the production site and the DR site.

Disaster recovery: When money isn’t an option
How would a full, traditional disaster recovery method compare with the best full-blown cloud-based DR approach, if budgets weren’t a concern? Are there any differences in cost and time to recover?

To design your enterprise’s DR strategy, you need to determine the disaster recovery budget and how quickly you need to recover servers when a disaster hits. To improve the speed of physical server recovery, it will cost you.

If money were not an issue, enterprises could use SAN-to-SAN replication in both approaches. They likely would move everything to a SAN and perform synchronous or asynchronous replication between the production site and the DR site. This decreases the recovery point objective (RPO) and improves your ability to recover quickly.

Recovery times for both traditional and cloud-based DR using synchronized SANs would be the same since you’d replicate the entire file structure, file system, etc. But the total cost of the traditional approach is much more expensive than for the cloud-based DR approach.

If you compare traditional physical DR without SAN-to-SAN synchronization versus cloud-based DR, then disaster recovery in the cloud is much less expensive and offers faster recovery times -- if you control both production and DR sites and replicate network configuration changes. In many cases, failure to provide network change management is a primary reason why disaster recovery does not work.

Because cloud-based DR requires fewer physical servers, there are fewer associated tasks. Physical servers in a cloud-based DR method include virtual host servers that run multiple virtual servers. For example, 20 physical servers can be virtualized to run only two virtual host servers -- the production site and the DR site.

Planning an overall cloud-based DR strategy
To design your enterprise’s DR strategy, you need to determine the disaster recovery budget and how quickly you need to recover servers when a disaster hits. Traditional DR is slow if you have a small budget. To improve the speed of physical server recovery, it will cost you. However, because many users are virtualizing at a high rate, cloud-based DR will become part of many corporations’ DR plans.

Cloud-based DR has a lot of promise to drastically reduce recovery costs and time compared to traditional methods used for physical servers. A key to cloud-based DR is that one organization should control both the production site and the DR site; the organization also needs to replicate production site network configurations using a change management process.

You can create your own virtualized data center and separate DR site, which allows you to control both the production and DR sites. You can also replicate the production site network configuration on the DR site with a change management process. However, this is a lot of work and will get expensive. Even though this method gives enterprise IT control over the environment, building this type of DR environment could cost almost as much as it cost to build the production site.

Enterprises also have the option to choose a DR hosting provider that uses the same virtualization technology as the enterprise used to virtualize the data center. This creates a private cloud environment, but forces enterprises to determine a change management process with the DR hosting provider. This is difficult to do and two organizations are in control, not one.


Bill Claybrook is a marketing research analyst with over 35 years of experience in the computer industry with the last dozen years in Linux, open source and cloud computing. Bill was research director, Linux and Open Source, at The Aberdeen Group in Boston and a competitive analyst/Linux product marketing manager at Novell. He is currently president of New River Marketing Research and Directions on Red Hat. He holds a Ph.D. in computer science.

Next Steps

Discover more about cloud testing strategies.


Dig Deeper on High availability and disaster recovery