Luiz - Fotolia
Love them or hate them, cloud service-level agreements (SLAs) have been the glue in the cloud provider-user relationship for years.
By specifying guidelines around service availability, performance and costs, a cloud service-level agreement helps users know what to expect from their cloud provider. In addition to providing details about the expected levels of a cloud service -- such as 99.9% uptime over the course of a month or year -- SLAs also define the compensation for users if expectations are not met.
But, like so many things in IT, the cloud computing market is quickly evolving. More and more organizations are moving their workloads to public cloud. Many other businesses, meanwhile, are racing toward hybrid cloud. And, as cloud adoption continues to grow, it's time for some cloud providers to get an SLA make-over.
SearchCloudComputing reached out to four cloud computing experts to get their take on how cloud provider SLAs should evolve, and what cloud users really want out of these lengthy legal documents. Here's what they had to say.
Cloud provider SLAs don't line up with actual public cloud usage. However, enterprises accept them, much like the way they accepted one-sided enterprise software licenses 30 years ago.
As massive amounts of workloads migrate to the cloud, SLAs should begin a rapid evolution. Security, for instance, should be a priority. Businesses need to put pressure on public cloud providers to offer a certain level of security, as well as mechanisms to remain in compliance. And those compliance features should cover existing laws, as well as laws that evolve over the years.
These days, some aspects of cloud provider SLAs are no longer relevant, and some are downright confusing. For instance, some SLAs have requirements that restrict cloud consumers from speaking to the press about issues such as outages and breaches. Other SLA terms limit cloud providers' liability to the amount that cloud consumers paid the provider in fees. These one-sided SLAs don't help cloud providers, since cloud users will choose not to use their services if the SLAs are not fair or reasonable.
Cloud users want SLAs to include assurances that the cloud service will remain operational, or the provider will have to pay penalties. This offsets the risk for cloud consumers, and makes cloud computing more attractive. Other protections should include security and compliance, and place the burden of security management on the cloud provider.
Of course, the next generation of SLAs needs to provide two-way value. Cloud consumers need to expect some parts of the SLA to be slanted toward the provider's needs. This would include areas such as billing and reporting, or the ability to levy fines against users who are not good public cloud neighbors.
The cloud provides new operational dimensions in two ways. First, compute power -- measured in throughput, while including performance and storage capacity -- is highly elastic. Second, the cloud changes the availability, disaster-tolerance and security paradigms. Cloud provider SLAs need to adjust to better reflect these two dynamics.
Let's look at compute power. The key to a dynamic cloud is its response time to change requests. After all, a cloud that takes an hour to set up an instance replica isn't very good. Rather than just focus on how long it takes to open up an empty instance or container, SLAs have to probe a bit deeper. It's much more important to understand the time it takes to have the instance imaged and operational. Tear down is a similar issue; you don't want to hold instances that aren't adding job performance.
Storage performance is a real issue in the cloud. The need for local instance storage to speed up computing is increasing. But the harsh reality is that most cloud instances are I/O-bound. SLAs that address the I/O rate of an instance on local instance storage and networked storage are critical to a good cloud experience.
Data availability in the cloud can be superb if data is replicated between geographically dispersed zones. Cutting corners on this, and being confined to a single zone, can cause data loss -- something Google recently suffered after a long outage. Google's older gear uses RAID storage controllers with battery backup, and recovery took longer than the battery life. As a result, data got lost. The cloud provider should be willing to guarantee data integrity for its various service models.
Cloud provider SLAs need to develop service tiers. For example, Amazon Web Services has the ability to offer users a variety of resources, ranging from an unreliable resource, such as the spot instance, all the way to a dedicated resource. Service tiers should rely on past performance by measuring uptime in the amount of "9s," as well as by security.
I don't think cloud provider SLA terms have become irrelevant -- they just need to get a new, more modern definition. Some users are also looking for greater detail in their providers' SLAs. For example, many cloud customers would like their providers to offer a dashboard that shows a clear picture of past cloud performance, including downtime and security breaches.
As more workloads move to the cloud, it's crucial for users to assess their cloud providers' SLAs and understand the requirements for making a claim under an SLA.
SLAs typically specify a percentage of resource availability during a given period of time, such as server uptime of at least 99.9% over the course of a month. But users should read the fine print to understand if routine maintenance is included in downtime or if there are other exclusions. It's also important to determine how an outage is defined. For example, if a cloud user can ping a server, but the server cannot run a mission-critical application, is the server still considered available?
Storage SLAs can be slightly different. A storage service, for example, may be considered available if users can upload a file, but, for users who cannot download a particular file, the service is unavailable for all practical purposes.
Understand what is required to document a cloud outage. This could include having log data that indicates an inability to connect to a server. Set up logging prior to the loss of a service to avoid scrambling to do so during an outage.
Ideally, SLAs would reflect the customers' needs, such as the ability to run mission-critical applications, rather than lower-level services that infrastructure as a service providers can easily measure. Software as a service (SaaS) providers are more likely to have function-oriented SLAs. Understand the services and events that your SaaS provider measures, and how those measures relate to your business operations.
Breaking down a cloud provider SLA
What to ask a cloud storage provider before signing an SLA
Lessons learned from cloud SLA horror stories
Cloud SLA gotchas to watch out for