Handling YARN resources manager issue with decommissioned nodes

If you hit the following exception with your YARN resource manager:

ERROR/Exception:

17/07/31 15:06:13 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes over rm1. Not retrying because try once and fail.
java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl

Troubleshooting:

Please try running the following command and you will see the exact same exception:

$ yarn node -list -all

Root Cause:

This problem happen when your YARN cluster have decommissioned nodes and it could cause issue with other dependent application i.e. H2O to not to start.

Solution:

Please make sure all the decommissioned nodes are either not listed or added back as full service nodes.
That’s it, enjoy!!
Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s