Enterprises moving business-critical applications to the cloud require measurable and enforceable service-level agreements from cloud providers. Much like an insurance policy for an IT organization, an SLA protects the insured subscriber from outages or slowdowns that would affect day-to-day business operations. For companies using cloud services, the SLA should be equal to the financial impact the slowdown causes to the business.
Instead, enterprises should insist on a performance or outage measurement that reflects how the outage affects your business in quantifiable terms over a period of time. In general, the longer the period of time you divide your outages over, the easier it is for a provider to meet those stipulations.
Companies in different industries will classify meaningful peak usage times differently. Pick a period of time that is meaningful to your business and your clients. Clients on a stock-trading cloud application, for example, will require full availability and performance during trading hours -- Monday through Friday, 9:30 a.m. to 4 p.m. A greeting card company website, however, may not require full availability during that time frame; the website may be slow for as long as an hour and the users may not click away to another site. Instead, the greeting card company may require more availability on weekends, when people are more likely to shop.
Choose the measurement that meets your end-user requirements and then negotiate an SLA with your cloud service provider that meets or exceeds those expectations. If you're protecting against a service outage, then be sure the SLA states a specific measurement of what, exactly, constitutes an outage for your users or customers.
You are the business advocate for your customers; therefore, you should be less concerned with the number of cloud data center incidents and more concerned with how an incident affects your overall business. Transaction-response time from your users' perspective tends to be the best measurement, as it reflects end users' perception of service delivery.
It's also important to agree upon the difference between an outage and a slowdown. If customers will move to another site after a slow login that takes three to five seconds or more, than that is your answer. Logins that take longer than five seconds to complete constitute an outage. A company that offers streaming video would consider any interruption of content delivery as an outage. Match the definition of an outage to your customers' specific needs.
As a consumer of cloud services, be sure you have access to cloud providers' performance and outage statistics. You'll also want a way to ensure that this information was accurately gathered. One way to do this is to use an objective third party to measure response times and outages. A third-party monitoring service uses agreed upon tools and processes to collect data from various cloud service providers and report on their findings.
Lastly, set concrete penalties if your cloud provider doesn't follow through on its SLA. These penalties should reflect the impact to your business, but likely will remain within the dollar amounts you spend monthly with your cloud service provider. Just like an insurance policy, the more you protect yourself, the more you will have to pay in premiums. So set your SLAs in line with your business requirements.
About the author:
With a background in managing one of the largest global financial networks, Mark Szynaka brings his network monitoring, security and ITIL best practices to the cloud. He helps enterprise IT investigate and implement cloud computing architectures -- public, private and hybrid -- using Amazon Web Services, Terremark/Verizon and Rackspace technologies.
This was first published in July 2013