The loss of some IT control is one of the most difficult transitions for a business to make when moving to the cloud. There is a distinct loss of visibility into the provider's infrastructure, which makes workload performance monitoring difficult -- if not impossible. Some amount of service downtime is unavoidable as the provider maintains and upgrades its own facilities. And the continuing trend toward automation and self-service can...
limit the response times or escalation paths for some cloud support intervention. All of these concerns have traditionally been insurmountable roadblocks for mission-critical applications.
In order for mission-critical workloads to really work in a public cloud setting, businesses must first determine the actual amount of downtime that is tolerable. This is the amount of downtime that you can live with for this important application -- it's not the amount of downtime you currently experience or the amount of downtime that you might prefer. This requires some honest self-assessment, but as long as the cloud provider's service-level agreement (SLA) guarantees less downtime than you're prepared to tolerate, moving the workload to a public cloud should not put the business at undue risk. For example, Microsoft Azure notes an availability of 99.95% -- about 21 minutes per month of downtime.
It may be possible to further mitigate downtime by selecting a cloud service that is geared for important workloads, such as VMware's vCloud Air Dedicated Cloud. Such services may be able to migrate important workloads to other servers or make other arrangements for continuity before maintenance or upgrade cycles, but get those services and associated guarantees in writing in your cloud SLA.
The actual reporting of uptime can itself be problematic, so evaluate and understand the provider's cloud SLA reporting policies and processes and know how they measure adherence to uptime guarantees. Look for providers that allow outside monitoring and reporting to verify uptime, and avoid providers that put the onus on users to prove downtime.
Next, have a clear understanding of the provider's support contact options and escalation paths. For example, look for features like 24/7/365 live telephone support from a provider handling your critical workloads and know how quickly problems can be elevated to a senior administrator or engineer. By contrast, if the only available support is through a self-service portal or an email address routed to a remote time zone, chances are that you might not get the timely support your important application demands. Test the provider's support regularly and stay abreast of any changes in support policy or availability. Remember that technical support may carry an additional cost. For example, Microsoft Azure carries a separate cost of $29 per month for all generally available services.
Even though the idea of mission-critical workloads in the cloud is gaining acceptance, it's far from ubiquitous. Some applications are simply so important that they require extraordinary deployments -- like multi-node and distributed clusters -- to effectively eliminate downtime under all but the most cataclysmic situations. In these cases, it may be best to forgo the cloud and continue hosting the workload on-premises.
About the author:
Stephen J. Bigelow is the senior technology editor of the Data Center and Virtualization Media Group. He can be reached at firstname.lastname@example.org.
Demolishing cloud app downtime with five nines availability
Taking your cloud SLA step-by-step
Don't sign SLA dotted line without asking storage questions