YARN Job Problem: Application application_** failed 1 times due to AM Container for XX exited with exitCode: 127

Run a sample Pi job in YARN (Hadoop 0.23.x or 2.2.x) might fail with the following error message:

[Hadoop_Home] $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.10.jar pi -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-0.23.10.jar 16 10000

Number of Maps = 16
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
14/01/31 14:58:10 INFO input.FileInputFormat: Total input paths to process : 16
14/01/31 14:58:10 INFO mapreduce.JobSubmitter: number of splits:16
14/01/31 14:58:10 INFO mapred.ResourceMgrDelegate: Submitted application application_1391206707058_0002 to ResourceManager at /0.0.0.0:8032
14/01/31 14:58:10 INFO mapreduce.Job: The url to track the job: http://Avkashs-MacBook-Pro.local:8088/proxy/application_1391206707058_0002/
14/01/31 14:58:10 INFO mapreduce.Job: Running job: job_1391206707058_0002
14/01/31 14:58:12 INFO mapreduce.Job: Job job_1391206707058_0002 running in uber mode : false
14/01/31 14:58:12 INFO mapreduce.Job: map 0% reduce 0%
14/01/31 14:58:12 INFO mapreduce.Job: Job job_1391206707058_0002 failed with state FAILED due to: Application application_1391206707058_0002 failed 1 times due to AM Container for appattempt_1391206707058_0002_000001 exited with exitCode: 127 due to:
.Failing this attempt.. Failing the application.
14/01/31 14:58:12 INFO mapreduce.Job: Counters: 0
Job Finished in 2.676 seconds
java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/user/avkashchauhan/QuasiMonteCarlo_1391209089737_1265113759/out/reduce-out
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:738)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1685)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1709)
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Rootcause:

The problem is caused because YARN is using different path for JAVA executable different then you have in your OS. The way you can troubleshoot this problem is that look for the local logs for the failed task which will show stderr and stdout, in which select “stderr” to see the failure and you will the following message:

/bin/bash: /bin/java: No such file or directory

Screen Shot 2014-01-31 at 3.30.08 PM

 

 

 

 

The hardcoded path to check for java is /bin/java however if you don’t have /bin/java as your Java executable the YARN job will fail. Like in OSX I have Java 1.7 running at /usr/bin/java as below:

$java -version 

java version “1.7.0_45”
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Solution:

To solve this problem in OSX I created a link from /bin/java to /usr/bin/java as below:

$ sudo ln -s /usr/bin/java /bin/java Password: *****

Lets retry the Pi Sample again:

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.10.jar pi -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-0.23.10.jar 16 10000

Number of Maps = 16
Samples per Map = 10000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
14/01/31 15:09:55 INFO input.FileInputFormat: Total input paths to process : 16
14/01/31 15:09:55 INFO mapreduce.JobSubmitter: number of splits:16
14/01/31 15:09:56 INFO mapred.ResourceMgrDelegate: Submitted application application_1391206707058_0003 to ResourceManager at /0.0.0.0:8032
14/01/31 15:09:56 INFO mapreduce.Job: The url to track the job: http://Avkashs-MacBook-Pro.local:8088/proxy/application_1391206707058_0003/
14/01/31 15:09:56 INFO mapreduce.Job: Running job: job_1391206707058_0003
14/01/31 15:10:01 INFO mapreduce.Job: Job job_1391206707058_0003 running in uber mode : false
14/01/31 15:10:01 INFO mapreduce.Job: map 0% reduce 0%
14/01/31 15:10:07 INFO mapreduce.Job: map 37% reduce 0%
14/01/31 15:10:12 INFO mapreduce.Job: map 50% reduce 0%
14/01/31 15:10:13 INFO mapreduce.Job: map 75% reduce 0%
14/01/31 15:10:18 INFO mapreduce.Job: map 100% reduce 0%
14/01/31 15:10:18 INFO mapreduce.Job: map 100% reduce 100%
14/01/31 15:10:18 INFO mapreduce.Job: Job job_1391206707058_0003 completed successfully
14/01/31 15:10:18 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=358
FILE: Number of bytes written=1088273
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=4358
HDFS: Number of bytes written=215
HDFS: Number of read operations=67
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=16
Launched reduce tasks=1
Rack-local map tasks=16
Total time spent by all maps in occupied slots (ms)=61842
Total time spent by all reduces in occupied slots (ms)=4465
Map-Reduce Framework
Map input records=16
Map output records=32
Map output bytes=288
Map output materialized bytes=448
Input split bytes=2470
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=448
Reduce input records=32
Reduce output records=0
Spilled Records=64
Shuffled Maps =16
Failed Shuffles=0
Merged Map outputs=16
GC time elapsed (ms)=290
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=3422552064
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1888
File Output Format Counters
Bytes Written=97
Job Finished in 23.024 seconds
14/01/31 15:10:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Estimated value of Pi is 3.14127500000000000000

Advertisements

Troubleshooting Hadoop Problems – FSNamesystem: FSNamesystem initialization failed.

When Namenode starts sometime you do not see the Namenode process running so you can take a look at the logs to understand what is going on…

This is the message you get when you start namenode:

[exec] Starting namenodes on [localhost]
[exec] localhost: starting namenode, logging to */hadoop-0.23.10/logs/hadoop-avkashchauhan-namenode-Avkashs-MacBook-Pro.local.out

Now you can see the log by opening the log using */hadoop-0.23.10/logs/hadoop-avkashchauhan-namenode-Avkashs-MacBook-Pro.local.log  <- make sure use the log extension

In the log you will see the error as below:

STARTUP_MSG: java = 1.7.0_45
************************************************************/
2014-01-31 11:36:16,789 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2014-01-31 11:36:16,848 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2014-01-31 11:36:16,848 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2014-01-31 11:36:17,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Missing directory /tmp/hdfs23/namenode
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.<init>(NameNodeResourceChecker.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:348)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:332)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:303)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:346)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:472)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:464)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:765)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:814)
2014-01-31 11:36:17,039 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system…
2014-01-31 11:36:17,039 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2014-01-31 11:36:17,040 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2014-01-31 11:36:17,040 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.io.IOException: Missing directory /tmp/hdfs23/namenode
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.addDirsToCheck(NameNodeResourceChecker.java:88)
at org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker.<init>(NameNodeResourceChecker.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:348)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:332)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:303)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:346)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:472)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:464)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:765)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:814)
2014-01-31 11:36:17,041 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2014-01-31 11:36:17,041 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at avkashs-macbook-pro.local/10.0.0.17
************************************************************/

Step-by-step guide

  • Jump to you hadoop_install/bin folder
  • Launch command ./hadoop namenode -format
  • Make sure you see the following message showing the format was done correctly:

14/01/31 11:41:48 INFO namenode.NNStorage: Storage directory /tmp/hdfs23/namenode has been successfully formatted.
14/01/31 11:41:48 INFO namenode.FSImage: Saving image file /tmp/hdfs23/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
14/01/31 11:41:48 INFO namenode.FSImage: Image file of size 128 saved in 0 seconds.
14/01/31 11:41:48 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/01/31 11:41:48 INFO util.ExitUtil: Exiting with status 0
14/01/31 11:41:48 INFO namenode.NameNode: SHUTDOWN_MSG:

  • Now restart the namenode again.