|By Maureen O'Gara||
|June 22, 2012 01:40 PM EDT||
VMware has set up a free downloadable open source project fittingly called Serengeti to enable enterprises to deploy, manage and scale Apache Hadoop in virtual and cloud environments both public and private and on vSphere of course.
It says it's working with the Apache Hadoop community to contribute extensions that will make key components virtualization-aware to support elastic scaling and improve Hadoop's performance in virtual environments.
Those contributions include changes to the Hadoop Distributed File System (HDFS) and Hadoop MapReduce projects so data and compute jobs can be optimally distributed across a virtual infrastructure.
VMware sees Hadoop emerging as the de facto standard for Big Data processing but says the resource-intensive nature of large Big Data clusters makes virtualization something Hadoop has to accommodate.
Decoupling Apache Hadoop nodes from the underlying physical infrastructure brings Hadoop the benefits of cloud infrastructure - to wit, rapid deployment, high availability, optimal resource utilization, elasticity and secure multi-tenancy.
Serengeti includes common Hadoop components like Apache Pig and Apache Hive.
The extensions are at https://issues.apache.org/jira/browse/HADOOP-8468.
VMware has also updated Spring for Apache Hadoop, the open source project launched in February to make it easier for enterprise developers to build distributed processing solutions with Apache Hadoop. The updates should let Spring developers build enterprise apps that integrate with the HBase database, the Cascading library and Hadoop security.