On-demand compute time is old hat for public cloud consumers accustomed to buying Amazon EC2 or Rackspace compute power on demand, but it is atypical in the world of high performance computing (HPC), almost exclusively the province of large institutions and publically funded supercomputers.
"With what we could purchase out of pocket, we'd have to bootstrap very slowly, or look for VC [venture capital] funding," said Dr. Brock Tice, the vice president of operations at Cardiosolv, a privately funded medical research firm. While Cardiosolv has its own small cluster on the premises for calculations, Tice estimates the resources he rents from Penguin would probably cost $500,000 to build, and other cloud options weren't suitable.
"We can't use [Amazon's Elastic Compute Cloud] EC2, since there's a lot of latency between the nodes," he said. The traditional HPC cluster, or grid, model was designed for massively parallel computations, where multiple server nodes have to communicate efficiently. Cloud infrastructures such as Amazon Web Services are designed for more democratic use and suffer performance penalties for some applications, according to William Fellows, a principal analyst at the 451 Group.
"The HPC community quite sensibly thinks they have a good opportunity here" said Fellows, who authored a report on the technology. He said the methodology and technology behind HPC clusters easily lends itself to cloud computing, but growing adoption of self-service, over-the-Web purchasing like EC2 has drawn customers that might otherwise have built their own HPC solutions or gone without.
"They can see their [potential] customers disappearing over the horizon into Amazon" instead of investing in HPC hardware, he said. Fellows' report described services like Amazon's as suitable for short transactional workloads such as Web applications and database tasks versus HPC, which was designed for complex, long-running algorithms processed in parallel.
To that end, Penguin built a 1,000-core cluster in a Utah data center (run by Voonami) that it calls "Beowulf in the cloud." Built and designed in part by Donald Becker, one of the original inventors of the Beowulf cluster, the Penguin on Demand cluster runs Red Hat based CentOS and is built for projects that run directly on the cluster operating system rather than on virtual machine images, as in a more typical public cloud service.
Penguin software engineer Josh Bernstein said that he wants the service to be available to even the smallest consumer. He cited a hypothetical art student who could "spend $15 to $20 with us" rendering animation instead of taking days on a personal computer or waiting for school resources.
Wuischpard said that prices per core per hour are still nebulous, but would be "geared to match Amazon's high-end performance" compute hours, which currently range from $0.80 to $1.20 per hour.
Another concern is that 1,000 compute cores is small beer in the HPC world- each of the computers linked together by grid computing giant TeraGrid lists cores by thousands and jobs are queued up far beyond capacity.
If Penguin can't keep pace with demand, it will have to invest in more capacity, something Bernstein said he's not too worried about. After all, he said, Linux clusters are designed to scale, and "the [Utah] data center is rather large."
Dig deeper on High-performance computing in the cloud