Commercial Hadoop Distribution or develop your own from scratch?

There is always a question with open source that if one should develop their own distribution directly using open source code or choose a commercial packaged solution which comes with little more additional components to make your job easy.

With Apache Hadoop, you too have option to choose Hadoop  directly from open source repo to built your own from scratch and pick and choose different components available along with Hadoop core. Otherwise you can choose a commercial release . While comparing both option I found a few interested  things and  decided to share.

With commercial Hadoop Distribution you will get:

  • A compound solution where you know all of these components available are working together perfectly and test well
  • You will have a stable reliable setup to start with
  • You will get additional items i.e. management console, admin portal etc with commercial release
  • One Single point of contact for support
  • You will have an ecosystem which will provide greater benefits
  • In some cases you will have none or very little effort to fine tune your system

While when you pick and choose components from Apache Hadoop, you:

  • start with picking Hadoop core and then select each module and try to make it work your Hadoop core
  • You build your own Hadoop clusters and manage it whatever is available
  • It will take some time depend on several factors to have a stable & fine tuned system ready
  • You own it and for any problem you will have to figure it you.
  • Best option for developing something from scratch within Hadoop
  • The best thing is that it’s your creation

Here are few commercial offering you can consider:

  • Cloudera offers CDH (Cloudera’s Distribution including Apache Hadoop) and Cloudera Enterprise.
  • MapR distribution is very sound and provides filesystem and MapReduce engine. MapR also provides additional capabilities such as snapshots, mirrors, NFS access and full read-write file semantics.
  • Yahoo! and Benchmark Capital formed Hortonworks Inc., whose focus is on making Hadoop more robust and easier to install, manage and use for enterprise users. Hortonworks in process to provide something so far I don’t know what is available
  • Amazon provides Hadoop in two different ways:
    • Amazon runs (inefficient) Hadoop on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
    • Amazon also run Hadoop on Elastic MapReduce (EMR) by provisioning Hadoop cluster, running and terminating jobs, and handling data transfer between EC2 and S3 are automated by Elastic MapReduce.
  • Microsoft Announced their Hadoop Offering in late 2011 and their service is currently in CTP. Microsoft Hadoop offering will be available on Windows Azure and Windows Servers.
  • IBM offers InfoSphere BigInsights based on Hadoop in both a basic and enterprise edition.
  • Silicon Graphics International offers Hadoop optimized solutions based on the SGI Rackable and CloudRack server lines with implementation services.
  • EMC released EMC Greenplum Community Edition and EMC Greenplum HD Enterprise Edition.
    • The community edition, with optional for-fee technical support, consists of Hadoop, HDFS, HBase, Hive, and the ZooKeeper configuration service.
    • The enterprise edition is an offering based on the MapR product, and offers proprietary features such as snapshots and wide area replication.
  • Google added AppEngine-MapReduce to support running Hadoop 0.20 programs on Google App Engine.
  • Oracle announced the Big Data Appliance, integrates Hadoop, Oracle Enterprise Linux, the R programming language, and a NoSQL database with the Exadata hardware.

Some of these commercial vendors have partnerships with Hardware vendors i.e.

Don’t choose any commercial distribution just because everything works. Do your research and find your reason to choose one or another.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s