News Stay informed about the latest enterprise technology news and product updates.

IT pros call out for lack of cloud computing prowess

In the age of cloud computing, why did the site collapse under its traffic load? IT pros say cloud could have prevented that.

IT industry experts said the launch of this month was hobbled by a byzantine, "old-school" infrastructure....

Could cloud computing have solved the site's performance problems?

In a world where billions of users log on to Facebook every day without a second thought, the idea of a website collapsing under a load of visitors is foreign.

But, the portal through which Americans shop for health insurance plans under the Affordable Care Act, experienced freezes, crashes and other glitches when it officially opened Oct. 1. Problems persisted throughout the ensuing week as government IT scrambled to stabilize it amid the ongoing government shutdown. What went wrong?

The user account creation portion of the website was to blame for the problems, and it collapsed under a heavy traffic load from millions of people trying to sign up for health insurance plans, Todd Park, United States chief technology officer (CTO),  told the New York Times the week after the failure.

Servers associated with the faulty application were to be moved from virtual machines (VMs) to dedicated hardware to fix the performance issues, according to public statements from spokespeople for the Department of Health and Human Services, which oversaw the website rollout.

But it didn't have to be this way, technical experts said.

"These are very manageable, typical ecommerce scaling problems," said John Engates, Rackspace Hosting CTO. Rackspace had nothing to do with the mess, but it has supported other, larger Web properties in the past, Engates said.

An application designed to scale out in response to high demand is hardly a rare species as cloud computing has become mainstream, Engates said.'s outdated infrastructure to blame

Cloud computing services support some areas of the highly complex government healthcare site, which connects to numerous federal agencies, such as the Internal Revenue Service, to determine an applicant's eligibility for federal healthcare subsidies.

But a trace of IP addresses back to their owners using ping, Tracert, WhoIs, "view source" on the website, and other forensic analysis tools, suggested cloud computing and scale-out application code were probably not used in most of the site's infrastructure.

For example, one portion of the site,, is running in a data center belonging to CenturyLink's Savvis. The IP address is associated with a company that uses Savvis for colocation services, a Savvis spokesperson said.

[The architecture is] for online applications built in 2003, not 2013.

Carl Brooks, 451 Research analyst

Other reports have one federal agency that connects with the website, the Center for Medicare and Medicaid Services, hosted on Verizon Terremark's cloud, but company spokespeople did not respond to attempts to clarify whether that service is cloud or managed hosting.

Industry experts said these portions of the site are more likely to be based on managed hosting rather than cloud computing services.

"In general, I haven't seen any of these exchanges use cloud services," said Shlomo Swidler, CEO of consulting firm Orchestratus Inc. "Several states, including California, host their exchanges in government data centers. Others use Akamai or other content delivery networks, so it's not clear where the actual hosting lives."

The way the website is built, using managed hosting and a static application infrastructure, "is the anti-cloud solution," said Carl Brooks, an analyst at New York-based 451 Research. "Totally old-school for a big Web property these days."

Brooks sees the site as an amalgamation of the Akamai content delivery network, collocated server hardware, private Ethernet and "some other hops over backwater networks."

"Akamai is actually the strongest link in the chain, but the back end is just not there," Brooks said. "It's not the [service] providers' fault; that [architecture is] for online applications built in 2003, not 2013."

People that build sites on Amazon Web Services (AWS) or similar environments don't have these problems, Brooks said.

"Compared with the much-vaunted Obama election night systems that were built on [AWS] and worked flawlessly," Brooks asked, "why weren't those folks in charge of a big site launch like this?"

Politics trump technical best practices

Indeed, the technology exists to create scale-out, high-performance Web properties using cloud computing, so why didn't the government take a page out of the Obama campaign's playbook when it came to building

The fact that disparate parts of the site are hosted in different data centers is a clue as to what really went wrong here: There were too many cooks in the kitchen due to the balkanized way federal IT procurement contracts work.

Unlike the Obama campaign, which was run by a single unified organization, a total of 47 different contractors worked on building, according to the Sunlight Foundation, a nonprofit, Washington, D.C.-based organization that reports on government.

The result was predictably fragmented dysfunction. For example, using a 'community edition' tool called Maltego, David Campbell, CEO of cloud security startup JumpCloud, determined that was built using Apache, jQuery, and Pingdom; is based on NGINX, Apache and Ruby on Rails.

"It's obvious that these two pieces probably came from completely different development teams working for completely different organizations," Campbell said.

The two main contractors working on the site were from the CGI Group, a Canadian consulting company, and Quality Software Services, a Maryland-based health care IT company, according to the Washington Post. These two companies did not respond to requests for comment on this week.

Beth Pariseau is senior news writer for SearchCloudComputing. Write to her at or follow @PariseauTT on Twitter.

Dig Deeper on Building scalable websites in the cloud



Find more PRO+ content and other member only offers, here.

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

This is a great piece Beth, good to see such a range of smart and informed opinions reviewing this from a technology perspective.

The poor scalability of the infrastructure was clearly a big issue - although not the only one, and perhaps not the biggest. Nevertheless, I too am floored that this was not built as a scalable cloud service. What a lost opportunity!

Other articles have pointed to fundamentally flawed (overly chatty) application code. And of course the arbitrary deadlines played into this too.

I would point out though that criticism of "too many cooks" approach may be misplaced. This approach - building composite apps from component capabilities created by various entities, and assembled via APIs - is not *per se* a bad design. Indeed, many (myself included) would argue that this is an essential part of the future of application delivery. So while it may seem a convenient whipping horse, and may well have been poorly executed, I would not blame this alone for the problems, especially absent a criticism of the components themselves, how they were put together, etc.

Andi Mann
CA Technologies
Those involved in deployment of solution has not done President Ombama any justice. They should have experts lined up for its deployment.
According to this article in Atlantic Monthly it is a cloud implementation.

While the front end of is distributed over Akamai, the back end of the site will be be hosted in a secure cloud.

"The servers are hosted in Terramark, a cloud computing firm that's a subsidiary of Verizon," said Booth. "When we got involved, Terramark had already been selected as the vendor. We inherited that; it was our first major cloud deployment. It's wonderful, compared to the traditional 'buy a lot of boxes and get the servers set up.' Percussion was fine as a CMS but the scalability issue was huge for us, really overarching."

Combining these two approaches finally realized some of the more aspirational rhetoric about the potential of "cloud computing" to deliver better savings and services that has bounced around Washington over the past four years.

"For us, it's a combination of Terramark as data center and Akamai as content distribution network," said Cole. "For the relaunch, no consumer traffic hits Terramark at all, with the exception of search queries. We have completely pushed the website out to Akamai, which gives us a lot of flexibility. This is by far the fastest site we've ever built. We wanted to make sure that this site is not adding any overhead, is as lightweight as it can be."
@Post467916 While it is true that the Centers for Medicare and Medicaid Services (CMS) are engaged with Verizon Terremark, that's only one small part of the underpinnings of the site -- the lion's share of the infrastructure does not appear to be cloud-based.

--Beth Pariseau
Not use cloud services is where they failed? Seriously? I HIGHLY doubt that.

They were virtualized and move from VM to physical to handle the demand. They had a private cloud. Where they failed was improper planning, not measuring their metrics, not doing load tests, etc.

Cloud is not the answer to every solution because you need to correctly PLAN your deployment regardless of the route you go. You really need to understand your application to know what you require from the infrastrucuture that supports it.
Just having virtualization doesn't mean they had a private cloud. My sources tell me that particular piece, the one running at Terremark Federal, consisted of underprovisioned NetApp arrays attached to VMware hosts. Hardly a full-fledged "private cloud."

--Beth Pariseau
Public cloud or private cloud. Cloud is simply a marketing term and can mean many different things to many different people.

The key word there is "underprovisioned".
Now the news is reporting that the system was crashing even during testing and they rolled it out anyway.
The MAIN performance problem was/is Apache as explained here: