Mistakes will be made, and in the cloud, that's increasingly becoming OK.
Another round of cloud outages swept through some of the biggest providers last month, but these unexpected events may not have hurt public perception of the cloud. Customers have come to see downtime as unavoidable, and vendors have become more transparent about system failures, according to analysts.
"Enterprises have outages all the time," said Lydia Leong, an analyst with Stamford, Connecticut-based Gartner Inc. "No one should expect a cloud provider to be perfect -- certainly the cloud providers don't expect themselves to be perfect."
Redundancy systems, disaster recovery and failover plans are nothing new, and IT pros are starting to wake up to the fact that it's no different in the cloud, according to David Linthicum, senior vice president with Cloud Technology Partners, a Boston-based consulting firm.
"It's the same [thing] we've been doing with internal systems for years," Linthicum said. "They're just figuring that out."
Last month, Amazon Web Services (AWS) experienced regional connection issues and increased application programming interface error rates for its virtual private cloud for about an hour, while Rackspace had intermittent availability issues throughout May for users attempting to create large volume sizes for cloud block storage in facilities in parts of the country.
Major downtime can cost companies millions and agitate customers, but vendors have gotten savvy about the need to communicate with clients about outages in real time.
Joyent cloud outage
San Francisco-based cloud provider Joyent Inc. may have had the most embarrassing cloud outage in May, as an administrator simultaneously rebooted all its virtual servers in one of its East Coast regions. It was followed by recoveries that took between 20 minutes and more than two hours.
Bryan CantrillJoyent CTO
The failure was attributed to human error, and the company provided a post-mortem that outlined the compilation of tangential issues that allowed an employee to omit two characters and take down an entire data center.
Failures happen, but it's up to the provider to own up to mistakes and correct them, said Bryan Cantrill, Joyent CTO, in an interview.
"People are remarkably understanding for human error, because we all make mistakes," Cantrill said. "But what they're not understanding for -- and shouldn't be -- is obfuscation, obscurity and silence."
To executives' surprise, much of the response was positive. There were customer complaints, but far fewer than expected, and none asked to terminate their relationship, Cantrill said.
"In a very perverse way this serves to strengthen the bond we have with our customers," Cantrill said.
Cloud providers are likely better suited to deal with outages, Linthicum said, because they are central to their core business model. He used the example of an after-hours outage, which for an in-house system would require an IT pro to drive back to the office to resolve, while the cloud providers would have continuous staffing to address the problem.
Cloud outages also have been less of an issue than many experts predicted, Linthicum said, adding that he hasn't heard of any cloud provider having a major data loss for its customers.
But the perception remains that cloud outages are endemic to the industry.
"It's very easy to look at a cloud provider outage and say, 'This is horrible,'" Leong said. "Organizations often treat cloud provider outages as emblematic of the entire industry as opposed to when they have an outage and they call it a one-off event they couldn't have done anything about."
Some of the expectations may be unfair, but the vendors have no one to blame but themselves, according to Matthew Healey, an analyst with Hampton, New Hampshire-based Technology Business Research Inc.
"They've come out and said, 'We're so reliable,' and now they've created a standard that they're not living up to."
Public cloud traction has increased among enterprises, despite continued concerns about security and investments in existing infrastructure. And while it may not be the top reason, outages continue to be a stumbling block for some IT pros.
Healey used the analogy of flying versus driving. While air travel is statistically safer, people feel better being behind the wheel of their own cars.
"You can't do anything on a 747," Healey said. "You don't have a lot of control, and that scares people."
Prepare for the inevitable
And while downtime is inevitable, IT pros shouldn't go into the cloud blindly. Reliable backup systems, failover and contingency plans are essential, as are cost analyses on which applications must run constantly.
Traditional IT outsourcing contracts involve teams of lawyers and thousands of pages of legal language, while cloud contracts aren't nearly as robust, Healey said.
"It's an evolving situation, so I think they're getting better, but by no means do I think they've figured out all of the wrinkles," Healey said. "There's going to be more pain coming up because we still haven't figured all this out."
Cloud provider service-level agreements typically provide monthly compute availability of 99.95% or higher and service credits for outages are typically equivalent to the duration of the downtime.
Cantrill didn't disclose how much the outage cost the company. He said it was significant, but he doesn't expect to see any long-term financial ramifications or mass exodus.
"If our customers feel misled or lied to, that will absolutely lead to opting out of the public cloud," Cantrill said. "If it's handled transparently and quickly, to be honest, it only serves to accelerate adoption of cloud computing because people know they don't always get that out of their own IT organization."
Trevor Jones is the news writer for SearchCloudComputing. You can reach him at firstname.lastname@example.org.