Handling Hadoop Error "could only be replicated to 0 nodes, instead of 1" during copying data to HDFS or with mapreduce jobs

Sometimes copy files to HDFS or running a MapReduce jobs you might receive an error as below:

During file copy to HDFS the error and call stack look like as below:

File /platfora/uploads/test.xml could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation

at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1339)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2198)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:299)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44954)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) 
UTC Timestamp: 11/20 04:14 amVersion: 2.5.4-IQT-build.73

During MapReduce job failure the error message and call stack look like as below:

DFSClient.java (line 2873) DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File ****/xyz.jar could only be replicated to 0 nodes, instead of 1

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1569)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:698)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:573)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

There could be various problems within datanode which could exhibit this issue such as:

  • Inconsistency in your datanodes
    • Restart your Hadoop cluster and see if this solves your problem.
  • Communication between datanodes and namenode
    • Network Issues
      • For example if you have Hadoop in EC2 instances and due to any security reason nodes can not talk, this problem may occur. You can fix the security by putting all nodes inside same EC2 security group to solve this problem.
    • Make sure that you can get datanode status from HDFS page or command line using command below:
      • $hadoop dfs-admin -report
  • Disk space full on datanode
    • What you can do is verify disk space availability in your system and make sure Hadoop  logs are not warning about disk space issue.
  • Busy or unresponsive datanode
    • Sometime datanodes are busy scanning block or working on a maintenance job initiated by namenode.
  • Negative block size configuration etc..
    • Please check the value of dfs.block.size in hdfs-site.xml and correct it per your Hadoop configuration