The AWS outage on Christmas Eve had one former customer saying 'I told you so,' but a comparison of cloud availability to enterprise data center availability paints a more nuanced picture.
The outage, which lasted 23 hours, was Amazon Web Services' (AWS) fourth and longest for 2012. It came at an extremely inopportune time for high-profile customer Netflix, which suffered an outage to its streaming video service as a result of the glitch with Amazon's Elastic Load Balancing (ELB) service. Netflix is otherwise well known for the resiliency of its cloud computing application designs.
Another Web company, WhatsYourPrice.com, left AWS following two outages in June, and CEO Brandon Wade said the Christmas Eve failure was unsurprising.
"Two outages for similar reasons, and a third, during a critical time of the year, is way too many outages for any decent service," Wade said. Amazon also suffered an outage Oct. 22 in which it was forced to issue service credits to some customers of its Relational Database Service.
Other IT experts, meanwhile, assert that the average enterprise sees more than four major outages per year. One cloud computing consultant on the East Coast estimated enterprises deal with twice that many outages, or more.
"Amazon's such a large target, you just hear about outages more often than you do with large enterprises," he said.
Cloud computing consultant
Even so, some experts say Amazon and other service providers should be held to a higher standard for cloud availability, considering the number and size of customers it hosts; its uptime standards should match that of telecoms, said Rick Villars, vice president of data center and cloud research with Framingham, Mass.-based IDC.
"How often do you pick up the phone and not get a dial tone?" he said.
One IT pro also pointed out that cloud service-level agreements (SLAs) are usually expressed in terms of application availability, not infrastructure availability, as in the case of AWS. Thus, Infrastructure as a Service availability concerns only compound already tricky requirements for application uptime.
"Without any compensation, this is going to lower our availability percentage," said Sean Perry, CIO for Robert Half International Inc., based in San Ramon, Calif. "We need to account for this going in, in our application designs."
Amazon partners say enterprise designs should always have availability in mind, including multi-region availability. One partner suggested that some AWS regions are less reliable than others.
"I do think it would be wise of AWS to encourage this more by communicating better this new-services risk in AWS East, and to change the fact that everything defaults to AWS East when you get started," said Kent Langley, vice president for Amazon Advanced Technology Consulting partner SolutionSet LLC, based in San Francisco.
Data center uptime vs. cloud availability
While Amazon's cloud availability appears higher than that of enterprise data centers, a true comparison is tough to make as there don't appear to be statistically significant studies surrounding private enterprise data center availability, according to IDC's Villars.
IDC reports that 84% of 500 IT pros in the U.S., Europe and South America have experienced a problem at the data center level that led to downtime or rollback for an application. That small survey was commissioned by CA, an IT management software company.
But it's the closest IDC has gotten to a specific uptime number for enterprise data center availability, Villars said.
Anecdotally, users and analysts put enterprise data center availability between 95% and 98.5%, lower than Amazon's publicly posted SLA of 99.95% availability annually.
But that 99.95% refers to regional uptime, as opposed to uptime by individual availability zone. This in turn makes uptime for individual data centers within Amazon's cloud, and an apples-to-apples comparison with enterprise data centers, close to impossible to precisely calculate.
However, if the hours of downtime reported in all four major AWS 'service events' in 2012 are added up -- approximately 2 hours June 14; approximately 4 hours June 29; approximately 6 hours October 22; and some 23 hours December 24 -- it comes to a total of 35 hours of downtime for the year. This translates into over 99.5% availability across all services for the year.
That's not up to the regional availability SLA, but it is greater than the estimated enterprise data center availability.
AWS, Azure and Google build more data centers