Enterprise Hadoop Solution distributed by key Hadoop vendors

Lets start from Cloudera Enterprise Data Hub:


Here is the offering from Hortonworks:


And this is how MapR is packaging Enterprise Hadoop


And finally Pivotal Enterprise Hadoop offering:


Keywords: Apache Hadoop, Cloudera, Hortonworks, Pivotal, MapR, Big Data


Windows Azure HDInsight – Installation Walkthrough



CTP version of HDInsight for Windows Server and Windows Clients is available to download from here

When you install HDInsight through WebPI the following components are installed in your Windows machine:


Once installation is done you can launch the Hadoop console to verify the installation is done along with checking the Hadoop version using command “hadoop version” as below:


Also you can check System > Services to verify that all HDInsight specific services are running as expected:




Primary Namenode and Secondary Namenode configuration in Apache Hadoop

Apache Hadoop Primary Namenode and secondary Namenode architecture is designed as below:

Namenode Master:

The conf/masters file defines the master nodes of any single or multimode cluster. On master, conf/masters that it looks like this:




This conf/slaves  file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will run.  When you have both the master box and the slave box to act as Hadoop slaves, you will see same hostname is listed in both master and slave.

On master, conf/slaves looks like as below:




If you have additional slave nodes, just add them to the conf/slaves  file, one per line. Be sure that your namenode can ping to those machine which are listed in your slave.

Secondary Namenode:

If you are building a test cluster, you don’t need to set up secondary name node on a different machine, something like pseudo cluster install steps. However if you’re building out a real distributed cluster, you must move secondary node to other machine and it is a great idea. You can have Secondary Namenode on a different machine other than Primary NameNode in case the primary Namenode goes down.

The masters file contains the name of the machine where the secondary name node will start. In case you have modified the scripts to change your secondary namenode details i.e. location & name, be sure that when the DFS service starts its reads the updated configuration  script so it can start secondary namenode correctly.

In a Linux based Hadoop cluster, the secondary namenode is started by bin/start-dfs.sh on the nodes specified in conf/masters file. Initially bin/start-dfs.sh calls bin/hadoop-daemons.sh where you specify the name of master/slave file as command line option

Start Secondary Name node on demand or by DFS:

Location to your Hadoop conf directory is set using $HADOOP_CONF_DIR shell variable. Different distributions i.e. Cloudera or MapR have setup it differently so have a look where is your Hadoop conf folder.

To start secondary name node on any machine using following command:

$HADOOP_HOME/bin/hadoop –config $HADOOP_CONF_DIR secondarynamenode

When Secondary name node is started by DFS it does as below:

$HADOOP_HOME/bin/start-dfs.sh starts SecondaryNameNode

>>>> $bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts masters start secondarynamenode

In case you have changed secondary namenode name say “hadoopsecondary” then when starting secondary namenode, you would need to provide hostnames, and be sure these changes are available to when starting bin/start-dfs.sh by default:

$bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts hadoopsecondary start secondarynamenode

which will start secondary namenode on ALL hosts specified in file ” hadoopsecondary “.

How Hadoop DFS Service Starts in a Cluster:
In Linux based Hadoop Cluster:

1. Namenode Service :  start Namenode on same machine from where we starts DFS .
2. DataNode Service : looks into slave file and start DataNode on all slaves using following command :
#>$HADOOP_HOME/bin/hadoop-daemon.sh –config $HADOOP_CONF_DIR start datanode
3. SecondaryNameNode Service:   looks into masters file and start SecondaryNameNode on all hosts listed in masters file using following command
#>$HADOOP_HOME/bin/hadoop-daemon.sh –config $HADOOP_CONF_DIR start secondarynamenode

Alternative to backup Namenode or Avatar Namenode:

Secondary namenode is created as primary namenode backup to keep the cluster going in case primary namenode goes down. There are alternative to secondary namenode available in case you would want to build a name node HA. Once such method is to use avatar namenode. An Avatar namenode can be created by migrating namenode to avatar namenode and avatar namenode must build on a separate machine.

Technically when migrated Avatar namenode is the namenode hot standby. So avatar namenode is always in sync with namenode. If you create a new file to master name node, you can also read in standby avatar name node real time.

In standby mode, Avatar namenode is a ready-only name node. Any given time you can transition avatar name node to act as primary namenode. When in need you can switch standby mode to full active mode in just few second. To do that, you must have a VIP for name node migration and a NFS for name node data replication.