When enterprises buy a network service, they recognize that they're vulnerable -- sometimes extremely vulnerable...
-- to problems with availability and performance. The solution is to negotiate an SLA. However, what end users want in a cloud service-level agreement varies from what enterprises need.
What should you do to get the best SLA possible?
At the application level, end users expect cloud computing services to meet an availability standard and a performance or quality of experience (QoE) standard, usually measured in terms of response time. What those users say they get is very different. In a survey of cloud users, only about 10% indicated that they had secured a specific SLA for their cloud computing services. There were a host of reasons for this needs-to-guarantees gap, though most are difficult to address.
Cloud SLAs falter at the network
The No. 1 problem reported with cloud SLAs is that they exclude the network's performance. Most cloud services are accessed through a network connection from a company other than the cloud service provider, and cloud service providers clearly cannot guarantee performance of that network connection.
Further, most cloud services today are accessed via the Internet, which is a best-efforts service that can't offer any guarantee. It's difficult to justify negotiating a cloud SLA when you can't guarantee the connection; it's also hard to prove that a cloud provider failed to meet an SLA when there's a component of the service -- the network -- between your QoE measurement point and the cloud. This particular issue also affects management connections to the cloud and the ability to write an SLA on management-level QoE.
Anyone who has ever written or monitored an SLA is familiar with the second most-reported issue: the agreement does not or cannot identify a reasonable mechanism for measuring QoE to determine compliance. This starts with the seemingly simple question, “What exactly does the cloud operator guarantee?” in measurable terms. Companies can measure response time in many ways, at many points, for example. Unless there is both an agreeable measurement point and value, no realistic SLA enforcement is possible.
The third problem with cloud SLAs is that user-supplied software components and user-created application connections can affect application QoE. If we consider Software as a Service (SaaS) as the "highest-level" cloud service and Infrastructure as a Service (IaaS) as the "lowest," then low-level services contain more user-contributed components and have a higher risk that something contributed by the customer will create external connections with unpredictable effects on performance. The provider can't guarantee, or even predict, what effect this will have on overall application QoE.
The fourth-most-reported SLA problem is that cloud parameter settings can have a major influence on application QoE, and this means that SLAs would have to be written under very specific parametric assumptions. Unusual conditions, even if they're not completely abnormal, may cause QoE and SLA problems.
Coming to terms with your cloud SLA
The real problem here is that while there may be many reasons why a meaningful SLA is hard to get, those reasons don't mitigate the need for one. What should you do to get the best SLA possible?
The answer, in short, is to avoid the problems above; the basic principle is to understand what the cloud provider can actually guarantee. Clouds work by allocating virtual resources to real applications. Presuming that the assignment works as you've expected it to, the resources will likely perform as expected. Variables you must constrain include the assignment parameters for the resources and your access to them through the network.
The most comprehensive SLA you will ever get is one from a cloud services provider who is also a network provider, so the most favorable relationships for SLA purposes will be with the cloud arm of network operators.
The second-best choice is to obtain a cloud connection from an operator that your cloud provider is prepared to suggest and guarantee. In either case, you'll want to use a "provisioned" VPN versus Internet or Internet VPN access to get specific communications guarantees.
In the resource-assignment space, the simple rule is to ask, "What is the virtual resource here?"
In SaaS, the virtual resource encompasses everything, because the user supplies no components. Thus, the cloud provider has complete control and should be expected to write an SLA all the non-network components of the application.
In lower-layer services like Platform as a Service (PaaS) or IaaS, the provider can guarantee what they provide; your goal is to determine how to measure the performance of the vendor’s contribution. In IaaS, the speed with which your application is assigned to a server will be the most variable element, and the speed at which a new server is substituted in case of a failure will determine availability.
PaaS is the most problematic in SLA terms, because you are not getting a specific hardware commitment but rather a platform that might include a number of physical hosts and software elements. Determining how much response time variability there is may require that you establish a "ping point" in the cloud where you can measure network delay to subtract that from end-to-end application delay to determine the cloud application processing contribution. Whatever you decide, the provider must accept the terms and it must be explicitly noted in the contract.
Cloud SLAs probably won't satisfy the buyer, but that's been true of many SLAs in the modern age. With care, you can at least get a cloud SLA that controls your level of risk and assures that your cloud services will meet enterprise goals.
ABOUT THE AUTHOR
Tom Nolle is president of CIMI Corporation, a strategic consulting firm specializing in telecommunications and data communications since 1982.