Getting Hadoop Configuration Parameters directly at Hadoop Command Prompt

Hadoop users can use “hdfs getconf -confKey KEY_NAME” command to get the value specific to any configuration keys the cluster have it configured.

 

hdfs-getconf

Advertisements

Finding Hadoop specific processes running in a Hadoop Cluster

Recently I was asked to provide info on all Hadoop specific process running in a Hadoop cluster. I decided to run few commands as below to provide that info.

Hadoop 2.0.x on Linux (CentOS 6.3) – Single Node Cluster

First list all Java process running in the cluster

[cloudera@localhost usr]$ ps -A | grep java
1768 ?        00:00:28 java
2197 ?        00:00:54 java
2439 ?        00:00:30 java
2507 ?        00:01:19 java
2654 ?        00:00:35 java
2784 ?        00:00:52 java
2911 ?        00:00:56 java
3028 ?        00:00:31 java
3239 ?        00:00:59 java
3344 ?        00:01:11 java
3446 ?        00:00:27 java
3551 ?        00:00:30 java
3644 ?        00:00:22 java
3878 ?        00:01:08 java
4142 ?        00:02:16 java
4201 ?        00:00:36 java
4223 ?        00:00:25 java
4259 ?        00:00:21 java
4364 ?        00:00:29 java
4497 ?        00:11:11 java
4561 ?        00:00:44 java

Next dig each Java specific process to dig further to see which Hadoop specific application is running within Java proc:

[cloudera@localhost usr]$ ps -aef | grep java

499       1768     1  0 08:29 ?        00:00:29 /usr/java/jdk1.6.0_31/bin/java -Dzookeeper.datadir.autocreate=false -Dzookeeper.log.dir=/var/log/zookeeper -********

yarn 2197 1 0 08:29 ? 00:00:55 /usr/java/jdk1.6.0_31/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn ********

sqoop2 2439 1 0 08:29 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/sqoop2/sqoop-server/conf/logging.properties -Dsqoop.config.dir=/etc/sqoop2/conf ****************

yarn 2507 1 0 08:29 ? 00:01:21 /usr/java/jdk1.6.0_31/bin/java -Dproc_nodemanager -Xmx1000m -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn **********

mapred 2654 1 0 08:30 ? 00:00:36 /usr/java/jdk1.6.0_31/bin/java -Dproc_historyserver -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-mapreduce -Dhadoop.log.file=yarn-mapred-historyserver-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hdfs 2784 1 0 08:30 ? 00:00:53 /usr/java/jdk1.6.0_31/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-localhost.localdomain.log ********

hdfs 2911 1 0 08:30 ? 00:00:57 /usr/java/jdk1.6.0_31/bin/java -Dproc_namenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-localhost.localdomain.log *********

hdfs 3028 1 0 08:30 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -Dproc_secondarynamenode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop ********

hbase 3239 1 0 08:31 ? 00:01:00 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log *******

hbase 3344 1 0 08:31 ? 00:01:13 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC ****

hbase 3446 1 0 08:31 ? 00:00:28 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-rest-localhost.localdomain.log -Dhbase.home.dir=/usr/lib/hbase/bin/*******

hbase 3551 1 0 08:31 ? 00:00:31 /usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase -Dhbase.log.file=hbase-hbase-thrift-localhost.localdomain.log *******

flume 3644 1 0 08:31 ? 00:00:23 /usr/java/jdk1.6.0_31/bin/java -Xmx20m -cp /etc/flume-ng/conf:/usr/lib/flume-ng/lib/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/asm-3.2.jar *******

root 3865 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log ********

mapred 3878 3865 0 08:31 ? 00:01:09 java -Dproc_jobtracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-jobtracker-localhost.localdomain.log -Dhadoop.home.dir=/usr/lib/hadoop-0.20-mapreduce -Dhadoop.id.str=hadoop **********

root 4139 1 0 08:31 ? 00:00:00 su mapred -s /usr/java/jdk1.6.0_31/bin/java — -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ************

mapred 4142 4139 1 08:31 ? 00:02:19 java -Dproc_tasktracker -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-0.20-mapreduce -Dhadoop.log.file=hadoop-hadoop-tasktracker-localhost.localdomain.log ***************

httpfs 4201 1 0 08:31 ? 00:00:37 /usr/java/jdk1.6.0_31/bin/java -Djava.util.logging.config.file=/usr/lib/hadoop-httpfs/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager ******

hive 4223 1 0 08:31 ? 00:00:26 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-metastore.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *********

hive 4259 1 0 08:31 ? 00:00:22 /usr/java/jdk1.6.0_31/bin/java -Xmx256m -Dhive.log.dir=/var/log/hive -Dhive.log.file=hive-server.log -Dhive.log.threshold=INFO -Dhadoop.log.dir=//usr/lib/hadoop/logs *****

hue 4364 4349 0 08:31 ? 00:00:30 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dlog4j.configuration=log4j.properties -Dhadoop.log.dir=//usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log *******

oozie 4497 1 6 08:31 ? 00:11:27 /usr/bin/java -Djava.util.logging.config.file=/usr/lib/oozie/oozie-server-0.20/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx1024m -Doozie.https.port=11443 *********

sqoop 4561 1 0 08:31 ? 00:00:45 /usr/java/jdk1.6.0_31/bin/java -Xmx1000m -Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop *******

cloudera 15657 8150 0 11:26 pts/4 00:00:00 grep java

Note: The above output is trimmed as each process spit out full class path etc. along with other process specific details.

HDInsight On Windows – Single Node Cluster

Apache Hadoop datanode Running Automatic .hadoop
Apache Hadoop historyserver Running Automatic .hadoop
Apache Hadoop isotopejs Running Automatic .hadoop
Apache Hadoopjobtracker Running Automatic .hadoop
Apache Hadoop namenode Running Automatic .hadoop
Apache Hadoop secondarynamenode Running Automatic .hadoop
Apache Hadoop tasktracker Running Automatic .hadoop
Apache Hive Derbyserver Running Automatic Ahadoop
Apache Hive hiveserver Running Automatic .hadoop
Apache Hive hwi Running Automatic .hadoop

Constructing blocks and file system relationship in HDFS

I am using a 3 nodes Hadoop cluster running Windows Azure HDInsight for the testing.

In Hadoop we can use fsck utility to diagnose the health of the HDFS file system, to find missing files or blocks or calculate them for integrity.

Lets Running FSCK for the root file system:

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fsck /

FSCK started by avkash from /10.114.132.17 for path / at Thu Mar 07 05:27:39 GMT 2013
……….Status: HEALTHY
Total size: 552335333 B
Total dirs: 21
Total files: 10
Total blocks (validated): 12 (avg. block size 46027944 B)
Minimally replicated blocks: 12 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 3
FSCK ended at Thu Mar 07 05:27:39 GMT 2013 in 8 milliseconds

The filesystem under path ‘/’ is HEALTHY

 

Now let’s check the total files in the root (/) to verify the files and directories:

 

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fs -lsr /

drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/apps
-rw-r–r– 3 avkash supergroup 4608 2013-03-04 21:16 /example/apps/cat.exe
-rw-r–r– 3 avkash supergroup 5120 2013-03-04 21:16 /example/apps/wc.exe
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/data
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /example/data/gutenberg
-rw-r–r– 3 avkash supergroup 1395667 2013-03-04 21:16 /example/data/gutenberg/davinci.txt
-rw-r–r– 3 avkash supergroup 674762 2013-03-04 21:16 /example/data/gutenberg/outlineofscience.txt
-rw-r–r– 3 avkash supergroup 1573044 2013-03-04 21:16 /example/data/gutenberg/ulysses.txt
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp/mapred
drwx—— – avkash supergroup 0 2013-03-04 21:15 /hdfs/tmp/mapred/system
-rw——- 3 avkash supergroup 4 2013-03-04 21:15 /hdfs/tmp/mapred/system/jobtracker.info
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive/warehouse
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /hive/warehouse/hivesampletable
-rw-r–r– 3 avkash supergroup 5015508 2013-03-04 21:16 /hive/warehouse/hivesampletable/HiveSampleData.txt
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /tmp
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /tmp/hive-avkash
drwxrwxrwx – SYSTEM supergroup 0 2013-03-04 21:15 /uploads
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM/graph
-rw-r–r– 3 avkash supergroup 80 2013-03-04 21:16 /user/SYSTEM/graph/catepillar_star.edge
drwxr-xr-x – avkash supergroup 0 2013-03-04 21:16 /user/SYSTEM/query
-rw-r–r– 3 avkash supergroup 12 2013-03-04 21:16 /user/SYSTEM/query/catepillar_star_rwr.query
drwxr-xr-x – avkash supergroup 0 2013-03-05 07:37 /user/avkash
drwxr-xr-x – avkash supergroup 0 2013-03-04 23:00 /user/avkash/.Trash
-rw-r–r– 3 avkash supergroup 543666528 2013-03-05 07:37 /user/avkash/data_w3c_large.txt

Above we found that there are total 21 directories and 10 files. Now we can dig further to check the total 12 blocks in HDFS for each files:

c:appsdisthadoop-1.1.0-SNAPSHOT>hadoop fsck / -files -blocks –racks
FSCK started by avkash from /10.114.132.17 for path / at Thu Mar 07 05:35:44 GMT 2013
/

/example

/example/apps

/example/apps/cat.exe 4608 bytes, 1 block(s): OK

0. blk_9084981204553714951_1008 len=4608 repl=3 [/fd0/ud0/10.114.236.28:50010, /

fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

/example/apps/wc.exe 5120 bytes, 1 block(s): OK
0. blk_-7951603158243426483_1009 len=5120 repl=3 [/fd1/ud1/10.114.228.35:50010,
/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/example/data

/example/data/gutenberg

/example/data/gutenberg/davinci.txt 1395667 bytes, 1 block(s): OK

0. blk_3859330889089858864_1005 len=1395667 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/example/data/gutenberg/outlineofscience.txt 674762 bytes, 1 block(s): OK

0. blk_-3790696559021810548_1006 len=674762 repl=3 [/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010, /fd1/ud1/10.114.228.35:50010]

/example/data/gutenberg/ulysses.txt 1573044 bytes, 1 block(s): OK

0. blk_-8671592324971725227_1007 len=1573044 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/hdfs

/hdfs/tmp

/hdfs/tmp/mapred

/hdfs/tmp/mapred/system

/hdfs/tmp/mapred/system/jobtracker.info 4 bytes, 1 block(s): OK

0. blk_5997185491433558819_1003 len=4 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/hive

/hive/warehouse

/hive/warehouse/hivesampletable

/hive/warehouse/hivesampletable/HiveSampleData.txt 5015508 bytes, 1 block(s): OK

0. blk_44873054283747216_1004 len=5015508 repl=3 [/fd1/ud1/10.114.228.35:50010,/fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/tmp

/tmp/hive-avkash

/uploads

/user

/user/SYSTEM

/user/SYSTEM/graph

/user/SYSTEM/graph/catepillar_star.edge 80 bytes, 1 block(s): OK

0. blk_-6715685143024983574_1010 len=80 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud2/10.114.236.42:50010, /fd0/ud0/10.114.236.28:50010]

/user/SYSTEM/query

/user/SYSTEM/query/catepillar_star_rwr.query 12 bytes, 1 block(s): OK

0. blk_8102317020509190444_1011 len=12 repl=3 [/fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

/user/avkash

/user/avkash/.Trash

/user/avkash/data_w3c_large.txt 543666528 bytes, 3 block(s): OK

0. blk_2005027737969478969_1012 len=268435456 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010]

1. blk_1970119524179712436_1012 len=268435456 repl=3 [/fd1/ud1/10.114.228.35:50010, /fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010]

2. blk_6223000007391223944_1012 len=6795616 repl=3 [/fd0/ud0/10.114.236.28:50010, /fd0/ud2/10.114.236.42:50010, /fd1/ud1/10.114.228.35:50010]

Status: HEALTHY
Total size: 552335333 B
Total dirs: 21
Total files: 10
Total blocks (validated): 12 (avg. block size 46027944 B)
Minimally replicated blocks: 12 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 3
FSCK ended at Thu Mar 07 05:35:44 GMT 2013 in 10 milliseconds

The filesystem under path ‘/’ is HEALTHY

Above we can verify that where all total 12 blocks are distributed. 9 blocks are distributed through 9 files and 3 blocks are distributed through 1 file.

 

Keywaords: FSCK, Hadoop,HDFS, Blocks, File System, Replications, HDInsight

Windows Azure HDInsight on Windows 8 (Single Node Hadoop Cluster) Walkthrough

Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.

In this video you will learn details about HDInsight running on a Windows 8 as a single node Hadoop cluster. Running CSharp and Java based MapReduce jobs examples shows here along with example of Apache Pig.

Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com

Windows Azure HDInsight Introduction and Installation Video

Windows Azure HDInsight is Microsoft response to Big Data movement. Microsoft’s end-to-end roadmap for Big Data embraces Apache Hadoop™ by distributing enterprise class, Hadoop-based solutions on both Windows Server and Windows Azure.

In this video you will learn how to install HDInsight on a Windows 8 machine, as a single node Hadoop cluster.

Microsoft’s roadmap for Big Data: http://www.microsoft.com/bigdata/
Apache Hadoop on Windows Azure: http://hadooponazure.com

Contact me: @avkashchauhan

Windows Azure HDInsight – Installation Walkthrough

 

 

CTP version of HDInsight for Windows Server and Windows Clients is available to download from here

When you install HDInsight through WebPI the following components are installed in your Windows machine:

Image

Once installation is done you can launch the Hadoop console to verify the installation is done along with checking the Hadoop version using command “hadoop version” as below:

Image

Also you can check System > Services to verify that all HDInsight specific services are running as expected:

Image