Hadoop MapReduce job failure with java.io.IOException: Task process exit with nonzero status of 137

While working with Amazon EMR, it is possible that you might see an exception as below with your failed map/reduce task:

java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 137. 
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

The Root Cause:

The potential problem in this case is that, the Hadoop mapper or reducer tasks are killed by Linux OS due to oversubscribing memory.  Keen in mind that this issue  Linux OOM killer and not the Java OOM Killer. In general such issue occurs when a particular process is configured to use lots more memory then the OS could provide and due to over memory subscription OS has no other way to kill the process.

With Amazon EMR such job failure could happen very easily if use have configured mapred.child.java.opts setting to way high comparative to specific EMR instance type. This is because each EMR instance type has preconfigured setting for Map and Reduce jobs and a misconfiguration could lead to this problem.

An Example:

For example the EMR Instance type is m1.large which means it has 768 MB memory allocated for each Map or Reduce task in Hadoop job as below:

m1.xlarge

Parameter Value
HADOOP_JOBTRACKER_HEAPSIZE 6912
HADOOP_NAMENODE_HEAPSIZE 2304
HADOOP_TASKTRACKER_HEAPSIZE 384
HADOOP_DATANODE_HEAPSIZE 384
mapred.child.java.opts -Xmx768m
mapred.tasktracker.map.tasks.maximum 8
mapred.tasktracker.reduce.tasks.maximum 3

However in the mapred-site.xml if user setup the mapred.child.java.opts to way high value i.e. 8GB as below:

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx8192m</value>
</property>

The above configuration would cause Linux OS to kill Mapper or Reduce task due to very high memory subscription. Linus OS will kill the same process even when it is configured to use 4GB also because it is way over the configured limit.

The solution:

The solution  to this problem is to let your job use default mapreduce settings instead setting by your self during the job submission. Using default mapreduce setting helps jobtracker to run the task under  whatever settings Amazon EMR instance already have configured.

 

Advertisements

4 thoughts on “Hadoop MapReduce job failure with java.io.IOException: Task process exit with nonzero status of 137

  1. Hi ,

    I using AWS to run the MR jobs, its failing with the error “java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)”

    when i run the code with small amount of data, its works completely fine, but for huge data i am facing the above error.

    How to use the default mapreduce setting that helps jobtracker to run the task under whatever settings Amazon EMR instance already have configured??

    Thanks,
    Arjun

    Like

  2. Hi ,

    I using AWS micro to run the MR jobs, its failing with the error “java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)”

    when i run the code with small amount of data, its works completely fine, but for huge data i am facing the above error.

    How to use the default mapreduce setting that helps jobtracker to run the task under whatever settings Amazon EMR instance already have configured??

    Thanks,
    Arjun

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s