Today we are releasing a new set of capabilities in Serengeti 0.8.0, which extends the reach of partner supported Hadoop versions and capabilities. In addition, we are broadening the reach of Serengeti into mixed workload configurations, enabling provisioning of an HBase cluster in this release.
As I’ve discussed in previous posts, most big-data environments consist of a mix of workloads. Serengeti’s mission is to enable as many of the big-data family of workloads into the same theme park, all running on a common shared platform.
Supporting mixed workloads is a key capability for big-data. In my customer discussions I see a mix of Map-Reduce, HBase, Solr, numerical analysis (R and SAS), and increasingly more of the Big SQL engines such as Impala, ParAccel, and Pivotal Hawq.
Support for HBase in Serengeti
By definition, we can deploy a whole range of workloads on the virtualized cluster. For example, we can deploy SAS on the same physical nodes as Hadoop, using the same resources at different times for each purpose. To deploy and configure HBase as a holistic distributed system we included HBase specific cluster configurations in this release.
Highlights of this new support include:
Sub-Saharan Africa, Central America or Asia?
We continue to work with our key Hadoop parters to strengthen support for Hadoop and Big-Data applications in a virtual environment. In addition to Apache Hadoop 1.0. Hortonworks HDP-1.0, Cloudera CDH3, Greenplum GPHD-1.2, we have added support for MapR Hadoop distributions, and Cloudera CDH4 .
New Support for Cloudera:
New Support for MAPR:
Special support for Temporary Data
One of the key things we’ve learned about Hadoop is that it has significant ephemeral data use. This is typically used for stages like map output, reducer input, and sort spills. I covered this in some detail in this post.
In Serengeti 0.8.0 we can now provision a shared file system service specifically for the shared data. This makes it easier to separate out the compute VMs from the datanodes, making them stateless – with the compute job input/output going into either HDFS, MAPR or Isilon distributed file systems, and the temporary data going to local disks.
How to Learn More
We published the new release of Serengeti on our main project site, including more detail on these key areas. Feel free to follow-up with comments or questions on this new release.