Although many IT shops want to host applications that process ginormous amounts of data in the cloud, the most popular “big data” platform requires dedicated hardware, which leads to reliability issues, among other concerns.
That could change with VMware’s Apache Software Foundation (ASF) open source project dubbed Serengeti. It will allow enterprises to deploy and manage Apache Hadoop on vSphere 5.0 in both cloud and virtual environments.
Hadoop on a virtual infrastructure could remove reliability concerns; with vSphere, a Hadoop application will be able to automatically restart if a node fails, according to company statements.
Additionally, the virtualization giant is working with members of the Hadoop community, including Cloudera Inc., Greenplum, Hortonworks, IBM and MapR, to contribute extensions to the ASF to make significant parts of Hadoop "virtualization-aware."
VMware’s Hadoop strategy: smart or misguided?
Some say VMware is wise to make adaptations for Hadoop on vSphere and become a player in the big data space.
With big data getting bigger every day, it is clear that there is a significant virtualization opportunity for big data-crunching workloads.
Al Hilwa, program director for application development software at IDC
"With big data getting bigger every day, it is clear that there is a significant virtualization opportunity for big data-crunching workloads,” said Al Hilwa, program director for application development software at Framingham, Mass.-based IDC.
Big data platforms such as Hadoop and other distributed databases were the missing piece of the modern application stack in VMware’s vFabric application software, said Jeffrey Reed, director of application development for Logicalis Group, an enterprise cloud provider based in the U.K.
"If [VMware isn’t] going to provide [its] own Hadoop or Hadoop-like solution, it is critical that [it has] a strategy around Hadoop and it’s ecosystem of distribution vendors," Reed said.
Not everyone agrees with that analysis, however.
"VMware's approach to highly available Hadoop is misguided," said Shlomo Swidler, CEO of Orchestratus Inc., a cloud computing consultancy in West Hempstead, N.Y. "It offers high availability via infrastructure-level support, whereas software-level HA is the norm for modern applications," Swidler added.
Still, the moves constitute two halves of a strategy to cement VMware's position vis-à-vis Hadoop and HA, said one analyst.
"Most important is making Hadoop a first-class corporate citizen," said Tony Baer, principal analyst at research firm Ovum based in London. "Hadoop is not very fault tolerant and virtualization is one of the technologies [that will help accomplish that]," Baer added.
Serengeti, which is available as a free download via the Apache 2.0 license, allows admins to deploy a Hadoop cluster in a single click within minutes.
Further, VMware is working with its Hadoop partners to contribute changes developed for the Hadoop Distributed File System and Hadoop MapReduce to the Hadoop community.VMware also announced an initiative to support its Cloud Foundry on OpenStack cloud environments last month.
Stuart J. Johnston is Senior News Writer for SearchCloudComputing.com. Contact him at firstname.lastname@example.org.