In what looks like a knock against claims that VMware and vCloud Express services are enterprise-ready, VMware partner and global hosting provider Terremark suffered an inexplicable outage on March 17. It lasted for almost seven hours and left affected users wondering what was going on.
Software vendor Apparent Networks, which runs a global monitoring service for public clouds, was caught in the outage and issued an advisory that stated, "Terremark experienced connectivity loss, which caused an outage in Terremark's vCloud Express services in their Miami data center."
"This impacted eight customers," said Terremark spokesman Xavier Gonzalez. One of those affected was probably Apparent Networks, which runs virtual machines (VMs) in different clouds and measures communication between them for its benchmarks, he said.
The trouble was restricted to a single network device that either failed, became unreliable or overloaded, and it affected about 2% of Terremark's vCloud user base, said Gonzalez.
"What happened was, about 12 EST, we experienced a networking degradation for some customers on a network device due to core overload," he added.
Gonzalez said the outage was never a full blackout, just a period of very high latency. Terremark was able to redistribute load around the affected device once the problem had been identified.
Adam Edwards, director of systems engineering for Apparent, said the problem was a double whammy.
"Something as simple as a reboot of your VM host was lagging on more than 15 minutes, and failing," he said.
He even experienced problems using the vCloud Portal to attempt to communicate with his machines at Terremark. "The vCloud management portal is actually hosted on vCloud Express at Terremark, so you couldn't even use the management tools [required by VMware]," Edwards said.
Terremark vCloud user expresses dissatisfaction
"I felt like I was dealing with a mom-and-pop hosting company," said John Kinsella, founder of Protected Technologies and a information security and infrastructure specialist. He said Terremark doesn't have a status page for vCloud Express and made only the most cursory effort to reach out during the outage.
"For a company of that scale to be so lackluster was surprising," he said, and left him cold.
He said it was pure luck that the outage wasn't much worse for him -- a development project running at Terremark went by the wayside. "Luckily, the VMware requirement went away and that project just went live Monday on [a client's] current hosting environment, or I would be getting tons of grief this week," he said.
I felt like I was dealing with a mom-and-pop hosting company.
John Kinsella, founder of Protected Technologies, on the Terremark vCloud outage
Kinsella said he maintains projects in every kind of hosting environment and most of the major cloud providers, and he was concerned about the message this sent about Terremark and vCloud Express. VMware is the virtualization platform of choice for the great majority of enterprise computing, and vCloud Express has been its pitch to make conservative, suspicious enterprises willing to come out into cloud computing environments.
"When you look at the size of this issue, if it really was that small, that gets to heart of the problem," Kinsella said. He added that this is what enterprises fear most about cloud: They give up control and access to infrastructure in favor of commoditized, pay-as-you-go, one-size-fits-all cloud, and random accidents or something their anonymous neighbor does can take them out without warning or recourse.
Learning from Amazon's mistakes
Kinsella said that experienced hosting providers like Terremark should already be doing a much better job of communicating during events like this, which are inevitable, especially if the goal is to convince enterprises that cloud computing is the way to go. Besides, he said, we've been through this before.
"Haven't they learned anything from watching Amazon?" he said.
Amazon Web Services has had a history of complaints about communication on outages and now has a service status page and takes pains to explain outages to the public. Kinsella said it's partly a matter of adjusting his perception; the VMware label should mean a different class of cloud, but he's learned a lesson.
"Once I recognized the quality of service I was paying for at Terremark, I had my system restored from backup to a Rackspace node within an hour," he said. He thinks the company should send a letter to affected users to explain the incident.
VMware did not respond to requests for comment on this story.
Carl Brooks is the Technology Writer at SearchCloudComputing.com. Contact him at firstname.lastname@example.org.