Hadoop Map/Reduce Administration from command line in Windows Azure Cluster

After you created your Hadoop cluster in Windows Azure, you can remote into it to start the Map/Reduce administration. Most of the processing log & HDFS data is already available over port 50030 and 50070  however, you can run bunch of standard Hadoop commands directly from command line.

After you login to your main node, you will see Hadoop Command Shell shortcut is already there which launches the command as below:

D:WindowsSystem32cmd.exe /k c:appsdistbinhadoop.cmd

Once you start the Hadoop Shell shortcut you will see the list of commands you can use as below:

For example you can check the name node details by using “Hadoop namenode” command:

If you want to start a datanode you just run “Hadoop datanode” command:

Now let’s check if any jobs are running using command “hadoop job –list”

c:appsdist>hadoop job -list

0 jobs currently running

JobId   State   StartTime       UserName        Priority        SchedulingInfo

Now let me start a Hadoop Job and then we will check the job list again:

c:appsdist>hadoop job -list

1 jobs currently running

JobId   State   StartTime       UserName        Priority        SchedulingInfo

job_201112310614_0004   4       1325469341874   avkash  NORMAL  NA

c:appsdist>hadoop job -status job_201112310614_0004

Job: job_201112310614_0004

file: hdfs://

tracking URL:

map() completion: 1.0

reduce() completion: 1.0

Counters: 23

Job Counters

Launched reduce tasks=1


Launched map tasks=1

Data-local map tasks=1


File Output Format Counters

Bytes Written=123






File Input Format Counters

Bytes Read=108

Map-Reduce Framework

Reduce input groups=7

Map output materialized bytes=189

Combine output records=15

Map input records=15

Reduce shuffle bytes=0

Reduce output records=15

Spilled Records=30

Map output bytes=153

Combine input records=15

Map output records=15


Reduce input records=15

As a new job has been started you will also see data coming out at datanode windows as well:

Keywords: Windows Azure, Hadoop, Apache, BigData, Cloud, MapReduce


