Hadoop Job and Task Name Classification and Convention

Hadoop MapReduce jobs and tasks have preconfigured naming convention so during job analysis or troubleshooting you can very easily understand what and where to look for.

hadoop

 

 

 

 

 

 

 

Here is some key information with regard to Hadoop jobs and mappers/reducers tasks naming classification and convention:

Job Name convention:

  • job_{DATE-TIME-WHEN-TASK-TRACKER-WAS-STARTED}_JobID
    • First Part – “job” keyword is assigned for job
    • Second Part – Full date and time when task tracker was started
    • Third Part – It is the job counter since task tracker was running

A task is unit of job execution consist of mappers and reducers. The total number of mappers and reducers are created when a job is submitted and based on number of mappers and reducer slots are available in a Hadoop cluster, job tracker send these tasks. There are two kind of tasks

  1. Mapper

    1. There are 3 kind of mappers
      1. Work mapper – These tasks are the actual mapper tasks which perform the identical work as other mappers. The ID for these mappers tasks starts with 0 and ends with Total – 1.
      2. Setup Mapper – This is the very last mapper task.
      3. Closeup Mapper – This is the task which clean the overall work. The ID for this task is “Total tasks – 1”. (See the example below to understand it clearly)
      4. Note: Both Setup and Closeup mappers are not counted in the actual mappers calculation. Also depending tasks count it is possible to have more than 1 cleanup task also.
  2. Reducer

    1. There are only 1 kind of reducer.

Task Name convention: 

  • For mapper
    • task_{DATE-TIME-WHEN-TASK-TRACKER-WAS-STARTED_JobID}_m_{6-Digit-Mapper-ID}_{mapper-instance}
  • For reducer
    • task_{DATE-TIME-WHEN-TASK-TRACKER-WAS-STARTED_JobID}_r_{6-Digit-Mapper-ID}_{reducer-instance}

Here is an Example:

  • Job ID
    • job_201307091604_1081
      • job – job
      • 201307091604 – The time when the Hadoop cluster was started
        • 2013/07/09 – Date
        • 16:04 (4:04 PM)
      • 1081 – Job ID
  • Mappers (Ex total 20 -> 000000 – 000019)
    • task_201307091604_1081_m_000000_0
      • First instance of mapper task (ID – 000)
    • task_201307091604_1081_m_000010_0
    • task_201307091604_1081_m_000010_1
    • task_201307091604_1081_m_000010_2
      • Above 3 instance of Same MapReduce task (ID – 010)
    • task_201307091604_1081_m_000019_0
      • First instance of last Mapper task (ID –  019)
  • Reducers (Total 6)
    • task_201307091604_1081_r_000000_0
      • First Instance of first reducer task (ID – 000)
    • task_201307091604_1081_r_000005_0
      • First instance of 6th reducer task (ID – 005)
  • Besides above there are 2 more mapper tasks added in every job as
    • Setup task
      • Even when it is Setup task however this task counter is very last
    • Cleanup task
      • This task ID will be “LAST – 1”
    • For example if you have total 20 mappers then Setup task ID will 21 and Cleanup taks will be 20.
      • 0 – 19 – total 20 mappers
      • 20 – cleanup task
      • 21 – setup task
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s