New ways of massaging data in the cloud are allowing businesses to radically reshape the way they use computing resources to deliver unprecedented levels of research on minimal investments.
For instance, medical research and consulting firm Eidetics is using database software running from EC2 to mine old medical records for new research in real-time. Yahoo is sponsoring open source cloud project Hadoop and now runs a good portion of its own internal search on it. And SportsDataHub proprietor Kevin Goodfellow built his sports analytics website entirely in the cloud -- and plans to stay there.
"We don't own anything," he said. Before the 2008 football season, Goodfellow had overstocked on hosted computing power, but scaled back his hosting even as use went up as he found efficiencies in the Qlikview software that runs his site.
QlikView holds datasets entirely in RAM and requires no disk space, which allows very fast response to changing real-time queries. This lets SportsDataHub users interact with football statistics and see new results practically instantly. Goodfellow said that services like EC2 have matured enough in the last year that, despite occasional outages, cloud computing is a foregone conclusion for his business. "It's doable, maybe not as perfect as it could be, but for now, good enough," he said.
Kevin Goodfellowproprietor of SportsDataHub, on being�hosted in the cloud
Eidetics has the same basic delivery model as SportsDataHub: An interactive web interface, but a very different kind of data to work with. Research Director Pieter Sheth-Voss said the design of Vertica, a massively parallel processing (MPP) columnar database, means that performance doesn't suffer when analyzing very large, complicated sets of medical records. Eidetics runs complex relational queries with Vertica and see the results "almost in real time" in a web portal.
He said the use of EC2 has made hardware costs irrelevant when planning for a new installation or a project. "The cost of the servers is dwarfed by the licensing costs of Vertica," he said, hastening to add "not that Vertica is expensive, but I don't have to think about the cost of the server" when planning an installation. Carl Olofson, database analyst for IDC, said the advent of practically unlimited parallel processing and enormous amounts of RAM are fundamentally altering the way databases will be designed and used.
"You can use the database in ways that weren't possible before," Olofson said, thanks to what he calls the "new economics of computing" where users have access to practically unlimited amounts of processing power and RAM to use on-demand.
Olofson sees the beginnings of a shift in design from serial applications, where programs query data step-by-step from a rigid format, to "smart database technology" that can perform multiple operations simultaneously on a set of data.
Calling it the third age of database technology, after the early relational databases of the seventies and the sophisticated models that grew up through the 80's and 90's, Olofson said new databases will adjust to the kind of data they process rather than forcing data into predetermined structures. He pointed to XML databases as an intermediate step to the next generation of database technology. He also said the technology is still in its infancy.
"In four to five years, you'll see this stuff massively come on the scene," he said. Olofson believes that the database giants, like Oracle and Sybase, are just starting to experiment with these new ideas, and although small projects are exciting, there will be an incubation period while new types and uses for databases emerge.