Handling Cloudera Hadoop Cluster from command line

If you have installed Hadoop from Cloudera distribution without Cloudera Manager you would have to manage your cluster from console and the things art not easy. Here are some of the important information to manage working on Cloudera Hadoop from console:

 

Where hadoop binary are located:

ubuntu@HADOOP_CLUSTER:~$ which hadoop

    • /usr/bin/hadoop

Files located at /usr/lib/hadoop/

drwxr-xr-x 2 root root 4096 May 22 21:00 bin
drwxr-xr-x 2 root root 4096 May 23 00:25 client
drwxr-xr-x 2 root root 4096 May 23 00:25 client-0.20
drwxr-xr-x 2 root root 4096 May 22 21:00 cloudera
drwxr-xr-x 2 root root 4096 May 22 21:00 etc
-rw-r–r– 1 root root 16678 Apr 22 17:38 hadoop-annotations-2.0.0-cdh4.2.1.jar
lrwxrwxrwx 1 root root 37 Apr 22 17:38 hadoop-annotations.jar -> hadoop-annotations-2.0.0-cdh4.2.1.jar
-rw-r–r– 1 root root 46858 Apr 22 17:38 hadoop-auth-2.0.0-cdh4.2.1.jar
lrwxrwxrwx 1 root root 30 Apr 22 17:38 hadoop-auth.jar -> hadoop-auth-2.0.0-cdh4.2.1.jar
-rw-r–r– 1 root root 2267883 Apr 22 17:38 hadoop-common-2.0.0-cdh4.2.1.jar
-rw-r–r– 1 root root 1213897 Apr 22 17:38 hadoop-common-2.0.0-cdh4.2.1-tests.jar
lrwxrwxrwx 1 root root 32 Apr 22 17:38 hadoop-common.jar -> hadoop-common-2.0.0-cdh4.2.1.jar
drwxr-xr-x 3 root root 4096 May 22 21:00 lib
drwxr-xr-x 2 root root 4096 May 23 00:25 libexec
drwxr-xr-x 2 root root 4096 May 22 21:00 sbin

 

Hadoop cluster specific XML configuration  files are stored here:

lrwxrwxrwx 1 root root 16 Apr 22 17:38 hadoop -> /etc/hadoop/conf
ubuntu@HADOOP_CLUSTER:~$ ls -l /usr/lib/hadoop/etc/hadoop
lrwxrwxrwx 1 root root 16 Apr 22 17:38 /usr/lib/hadoop/etc/hadoop -> /etc/hadoop/conf
ubuntu@HADOOP_CLUSTER:~$ ls -l /etc/hadoop/conf
lrwxrwxrwx 1 root root 29 May 22 21:00 /etc/hadoop/conf -> /etc/alternatives/hadoop-conf
ubuntu@HADOOP_CLUSTER:~$ ls -l /etc/alternatives/hadoop-conf
lrwxrwxrwx 1 root root 23 May 22 22:02 /etc/alternatives/hadoop-conf -> /etc/hadoop/conf.avkash
ubuntu@HADOOP_CLUSTER:~$ ls -l /etc/hadoop/conf.avkash/

    • core-site.xml
    • hadoop-metrics.properties
    • hadoop-metrics2.properties
    • hdfs-site.xml
    • log4j.properties
    • mapred-site.xml
    • slaves
    • ssl-client.xml.example
    • ssl-server.xml.example
    • yarn-env.sh
    • yarn-site.xml

Note: Otherwise you can try to find Hadoop configuration files as

    • ubuntu@ec2-54-214-67-144:~$ sudo find / -name “hdfs*.xml”

 

Hadoop cluster specific scripts are located here:

  • Hadoop
    • /usr/lib/hadoop/libexec/hadoop-config.sh
    • /usr/lib/hadoop/libexec/hadoop-layout.sh
    • /usr/lib/hadoop/sbin/hadoop-daemon.sh
    • /usr/lib/hadoop/sbin/hadoop-daemons.sh
  • MapReduce
    • /usr/lib/hadoop-0.20-mapreduce/bin/hadoop-daemon.sh
    • /usr/lib/hadoop-0.20-mapreduce/bin/hadoop-config.sh
    • /usr/lib/hadoop-0.20-mapreduce/bin/hadoop-daemons.sh

To start/stop/restart Hadoop service, the scripts are located here: 

  • Hadoop Namenode and Job Tracker
    • /etc/init.d/hadoop-0.20-mapreduce-jobtracker
    • /etc/init.d/hadoop-hdfs-namenode
  • Hadoop Datanode and TaskTracker
    • /etc/init.d/hadoop-hdfs-datanode
    • /etc/init.d/hadoop-0.20-mapreduce-tasktracker

 

If you decided to start Hadoop Service manually you can do the following:

  • Stop Services:
    • sudo /etc/init.d/hadoop-hdfs-namenode stop
    • sudo /etc/init.d/hadoop-hdfs-datanode stop
    • sudo /etc/init.d/hadoop-0.20-mapreduce-jobtracker stop
    • sudo /etc/init.d/hadoop-0.20-mapreduce-tasktracker stop
  • Start Services
    • sudo /etc/init.d/hadoop-hdfs-namenode start
    • sudo /etc/init.d/hadoop-hdfs-datanode start
    • sudo /etc/init.d/hadoop-0.20-mapreduce-jobtracker start
    • sudo /etc/init.d/hadoop-0.20-mapreduce-tasktracker start

 

Running hdfs command in hdfs user context:

  • sudo -u hdfs hdfs dfs -mkdir /tmp
  • sudo -u hdfs hdfs dfs -chmod -R 1777 /tmp
  • sudo -u hdfs hdfs dfs -mkdir -p /var/lib/hadoop-hdfs/cache/
  • hdfs dfs -ls /

 

Running Hadoop example jobs from console:

  • ubuntu@HADOOP_CLUSTER:~$ hdfs dfs -copyFromLocal history.log /
  • ubuntu@HADOOP_CLUSTER:~$ hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount /history.log /home/ubuntu/results
  • 13/06/04 16:14:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
  • 13/06/04 16:14:35 INFO input.FileInputFormat: Total input paths to process : 1
  • 13/06/04 16:14:35 INFO mapred.JobClient: Running job: job_201306041556_0005
  • 13/06/04 16:14:36 INFO mapred.JobClient: map 0% reduce 0%

 

The following error means the HDFS is running but JobTracker is not running at Hadoop: 

  • 13/06/04 15:48:48 INFO ipc.Client: Retrying connect to server: HADOOP_CLUSTER.us-west-2.compute.amazonaws.com/10.254.42.72:8021. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
  • 13/06/04 15:48:49 INFO ipc.Client: Retrying connect to server: HADOOP_CLUSTER.us-west-2.compute.amazonaws.com/10.254.42.72:8021. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
  • 13/06/04 15:48:50 INFO ipc.Client: Retrying connect to server: HADOOP_CLUSTER.us-west-2.compute.amazonaws.com/10.254.42.72:8021. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
  • 13/06/04 15:48:51 INFO ipc.Client: Retrying connect to server: HADOOP_CLUSTER.us-west-2.compute.amazonaws.com/10.254.42.72:8021. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

Keywords: Hadoop, MapReduce, Cloudera, Services

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s