|By Maureen O'Gara||
|September 26, 2011 06:00 AM EDT||
EMC, well, at least its Greenplum unit has put together a thousand-node multi-tenant analytics platform to accelerate the development and testing of the disruptive but persnickety open source Apache Hadoop software.
That's 1,000-odd hardware nodes or 10,000 nodes when you count virtual machines, along with 24PT of physical storage. In a word, that's huge.
Greenplum wants to ensure that Hadoop turns into a really serious enterprise-ready Big Data tool for sifting through mounds of unstructured data to unlock their secrets and make predictions but contends that nobody really knows how to deploy the stuff in part because everybody's Hadoop is different.
So it wants to come up with a formula others can follow and overturn the long-standing Hadoop dogma that one can't have separate compute and storage on the same nodes. Greenplum says it increases efficiency and requires less hardware, pointing to Yahoo's 40,000 Hadoop servers, which only offer 10%-15% utility. It thinks it can get 80%.
Anyway, Greenplum wants various user cases developed around analytics.
Intel, VMware, Micron, Seagate, Super Micro Computer and Mellanox Technologies are contributing to the so-called Greenplum Analytics Workbench project. Figure on continually updated versions of Hadoop, Ethernet and InfiniBand, a "kickbutt" interconnect, and Sandybridge racks, 60 of ‘em.
Greenplum wouldn't put a price tag on the thing but flat out it'll be the largest test-bed cluster for continuous integration tests on the Apache Hadoop trunk and its future releases.
The initial object of the exercise is scale or should we say contributions will be validated at scale so enterprises, which are often intimidated by Hadoop, can confidently deploy them in a production. It's supposed to identify bugs, stabilize new releases and optimize hardware configurations.
Greenplum says all testing and results will be contributed back to the Apache Software Foundation and the open source community, and its testing will be planned in coordination with Apache's key committers.
Continuous integration testing on the Greenplum Analytics Workbench will begin in January. Its use will be free and hosted at the giant Switch Communications SuperNAP data center in Las Vegas. EMC's Mozy folks will manage the thing.