Service-level agreements are common in network services and they measure the parameter set popularly called "QoS", but for cloud computing or platform services it's difficult to find helpful precedents for negotiating an SLA.
At the high level, the issues are the same; you must define criteria to be met and remedies if they are not. The devil is in the details, and to get there it's essential that you begin with the parameters of the application experience at the user level. The business case for cloud computing will be based on some expected range of availability and performance, and that's what the SLA must address.
The first point to address in a cloud SLA is that everything associated with an application experience isn't part of cloud computing. Cloud performance as measured at the point of application use is the sum of network performance, application performance, and cloud infrastructure performance. The cloud provider can be accountable for the last of these and not the first two, so it's important to understand what both the other factors contribute to overall performance when writing an SLA.
Accessing cloud services over the Internet or other best-effort service will make it very difficult to create a meaningful cloud SLA because the network contributes a completely variable delay, loss, and failure rate. If you want to guarantee transaction/application performance as experienced by the user, you'll need to somehow limit this variable. That may be possible if you can negotiate an SLA with a specific ISP with whom your cloud provider has a direct connection. If you expect to access cloud applications randomly from multiple locations and ISPs, a tight and meaningful application performance metric will be very hard to obtain.
Getting the application's performance variables out of the equation will normally mean running the application using local server and network resources to measure performance under ideal circumstances. The measurement should also include noting how variations in memory, storage, etc. impact performance because those same factors may vary in cloud computing services. It's important to duplicate as much of the cloud's IT resources as possible to get a good measurement.
When both application and network performance factors have been handled, the resulting information can be used to set cloud computing performance limits. For example, if an application running locally generates a 1-second transaction response time and the network connection adds a half-second delay in both directions (not unreasonable for Internet or VPNs), there is a total delay of 1.5 seconds accumulated. If your operating departments want a 2 second response time guarantee, you can afford to add only another half-second in cloud computing delay.
The next critical step is to convert application performance to a set of parameters that can be measured on your cloud provider's infrastructure. This can only be meaningful if you have a specific configuration for your cloud service, so work with your cloud provider (or provider candidates) to devise the best cloud configuration to meet your needs.
This would include whether you used reserved or ad hoc cloud resources, the number of images of your application that would be run at a time, the geography in which they would run, the database used, the system type and memory, etc. This configuration should be tested to insure that it meets the basic requirements for performance established by the applications' users. The configuration exercise will also help define the features of the cloud that have a direct bearing on performance and reliability—such as failover from bad application instances or load balancing among instances.
From this configuration, you must now establish a set of resource usage, availability, and performance metrics based on the management tools/capabilities of the cloud provider. The presumption is that if these metrics are all met, the configuration is performing as you designed it to, and thus the application user objectives are being met.
Remember that while transaction or application performance is the goal of your cloud SLA, it will likely be your own responsibility to create a performance standard for user experience and then apply that standard to creating cloud computing and network performance objectives.
In order to apply your cloud SLA effectively, you'll have to have problem isolation tools to separate issues with the network or you application from those of the cloud. These should be integrated with the cloud management tools available from your provider to build a monitoring portfolio you can use to proactively monitor performance and respond to user complaints.
|Tom Nolle, is president of CIMI Corporation, a strategic consulting firm specializing in telecommunications and data communications since 1982. He is a member of the IEEE, ACM, Telemanagement Forum, and the IPsphere Forum, and is the publisher of Netwatcher, a journal in advanced telecommunications strategy issues.|