Handling YARN resources manager issue with decommissioned nodes

If you hit the following exception with your YARN resource manager:

ERROR/Exception:

17/07/31 15:06:13 WARN retry.RetryInvocationHandler: Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes over rm1. Not retrying because try once and fail.
java.lang.ClassCastException: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl

Troubleshooting:

Please try running the following command and you will see the exact same exception:

$ yarn node -list -all

Root Cause:

This problem happen when your YARN cluster have decommissioned nodes and it could cause issue with other dependent application i.e. H2O to not to start.

Solution:

Please make sure all the decommissioned nodes are either not listed or added back as full service nodes.
That’s it, enjoy!!
Advertisements

Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you will see the model stored there.
Here is a good example to understand it better:
Step [1] Starting Hadoop driver in EC2 environment as below:
[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.
Note: Above you could see that hadoop command is ran on ip address 10.0.104.179 however the node where H2O server is shown as 10.0.65.248.
Step [2] Connect R client with H2O
> h2o.init(ip = "10.0.65.248", port = 54323, strict_version_check = FALSE)
Note: I have used the ip address as shown above to connect with existing H2O cluster. However the machine where I am running R client is different as its IP address is 34.208.200.16.
Step [3]: Saving H2O model:
h2o.saveModel(my.glm, path = "/tmp", force = TRUE)
So when I am saving the mode it is saved at 10.0.65.248 machine even when the R client was running at 34.208.200.16.
ec2-user@ip-10-0-65-248 ~]$ ll /tmp/GLM*
-rw-r--r-- 1 yarn hadoop 90391 Jun 2 20:02 /tmp/GLM_model_R_1496447892009_1
So you need to make sure you have access to a folder where H2O service is running or you can save model at HDFS something similar to as below:
h2o.saveModel(my.glm, path = "hdfs://ip-10-0-104-179.us-west-2.compute.internal/user/achauhan", force = TRUE)

Thats it, enjoy!!

Setting various logs levels for H2O

Setting log levels in different H2O deployment scenarios.

Standalone H2O mode (H2O on VMs, laptops…)

You can specify options -log_level and/or -log_dir:

-log_level <TRACE,DEBUG,INFO,WARN,ERRR,FATAL>
Write messages at this logging level, or above. Default is INFO.
-log_dir <fileSystemPath>

The directory where H2O writes logs to disk. (This usually has a good default that you need not change.)

$ java -jar h2o.jar -log_level DEBUG

H2O on Hadoop

The log level option is not directly exposed. You can still set the log level by adding an extra java argument using the -J option of the Hadoop h2o driver: “-J -log_level -J DEBUG”. Here is an example:

$ hadoop jar h2odriver.jar -J -log_level -J DEBUG -nodes 1 -mapperXmx 1g -output t/$RANDOM

Sparkling Water:

Log levels can be adjusted using Spark conf properties: spark.ext.h2o.node.log.level and spark.ext.h2o.client.log.level, these are two separate options for the compute node and the h2o client running in Sparkling Water’s driver program (H2O client).

$ bin/sparkling-shell --conf spark.ext.h2o.node.log.level=DEBUG --conf spark.ext.h2o.client.log.level=DEBUG

Accessing Remote Hadoop Server using Hadoop API or Tools from local machine (Example: Hortonworks HDP Sandbox VM)

Sometimes you may need to access Hadoop runtime from a machine where Hadoop services are not running. In this process you will create password-less SSH access to Hadoop machine from your local machine and once ready you can use Hadoop API to access Hadoop cluster or you can directly use Hadoop commands from local machine by passing proper Hadoop configuration.

Starting Hortonworks HDP 1.3 and/or 2.1 VM

You can use these instructions on any VM running Hadoop or you can download HDP 1.3 or 2.1 Images from the link below:

http://hortonworks.com/products/hortonworks-sandbox/#install

Now start your VM and make sure your Hadoop cluster is up and running. Once you VM is up and running you will get IP address and hostname on the VM screen which is mostly 192.168.21.xxx as shown below:

Screen Shot 2014-06-05 at 1.21.53 PM

Accessing Hortonworks HDP 1.3 and/or 2.1 from browser:

Using the IP address provided you can check the Hadoop server status on port 8000 as below

HDP 1.3 – http://192.168.21.187:8000/about/

HDP 2.1 – http://192.168.21.186:8000/about/

The UI for both HDP1.3 and HDP 2.1 looks as below:

hdp13-21

 

 

 

 

 

 

 

 

 

 

 

Now from your host machine you can also try to ssh to any of the machine using user name root and password hadoop as below:

$ssh root@192.168.21.187

The authenticity of host ‘192.168.21.187 (192.168.21.187)’ can’t be established.
RSA key fingerprint is b2:c0:9a:4b:10:b4:0f:c0:a0:da:7c:47:60:84:f5:dc.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added ‘192.168.21.187’ (RSA) to the list of known hosts.
root@192.168.21.187’s password: hadoop
Last login: Thu Jun 5 03:55:17 2014

Now we will add password less SSH access to these VM and there could be two option:

Option 1: You already have SSH key created for yourself earlier and want to reuse here:

In this option, first we will make sure we have RSA based key for SSH session in our local machine and then we will use it for password less SSH access:

  1. In your home folder (/Users/<yourname>) visit to folder name .ssh
  2. Identify a file name id_rsa.pub  (/Users/avkashchauhan/.ssh/id_rsa.pub) and you will see a long string key there
  3. Now also identify another file name  authorized_keys there (i.e. /Users/avkashchauhan/.ssh/authorized_keys) and you will see one or more long string keys there.
  4. Check the content of id_rsa.pub and make sure that this key is also available into authorized_keys files along with other keys (if there)
  5. Now copy the key string from id_rsa.pub file in memory.
  6. SSH to your HDP machine as in previous step using username and password
  7. visit to /root/.ssh folder
  8. You will find authorized_keys file there so open this file in editor and append the key here which you have copied in previous step #5.
  9. Save authorized_keys files
  10. Now in the same VM you will find id_rsa.pub file and please copy its content in memory.
  11. Exit the HDP VM
  12. In your host machine you have already checked authorized_keys in step #3, append the key from HDP VM into authorized_keys file and save it.
  13. Now try logging HDP VM as below:

ssh root@192.168.21.187

Last login: Thu Jun 5 06:35:31 2014 from 192.168.21.1

Note: You will see  that password is not needed this time as Password less SSH is working.

Option 2: You haven’t created SSH key in your local machine and will do everything from scratch:

In this option first we will create a SSH based key first and then use it exactly with Option #1.

  • Log into your host machine and open terminal
  • For example your home folder will be /Users/<username>
  • Create a folder name .ssh inside your working folder
  • now go inside .ssh folder and run the following command

$ ssh-keygen -C ‘SSH Access Key’ -t rsa

Enter file in which to save the key (/home/avkashchauhan/.ssh/id_rsa): ENTER

Enter passphrase (empty for no passphrase): ENTER

Enter same passphrase again: ENTER

  • You will see id_rsa and id_rsa.pub files are created. Now we will append the contents of id_rsa.pub into authorized_keys files and it is not there then we will create and add. For both the command is as below:

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  • In the above step you will see the contents of id_rsa.pub are included into authorized_keys.
  • Now we will set proper permissions for keys and folders as below:

$ chmod 700 $HOME && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*

  • Finally we can follow Option #1 now to add both id_rsa.pub keys in both machines authorized_keys files to have password less ssh working.

Adding correct Java Home path to java

Migrating Hadoop configuration from Remote Machine to local Machine:

To get this working we will have to get Hadoop configuration files from HDP server to local machine and to do this you just need to copy Hadoop configuration files from HDP servers as below:

HDP 1.3:

Create a folder name hdp13 in your working folder and now use SCP command to copy configuration files as below over password less SSH:

$ scp -r root@192.168.21.187:/etc/hadoop/conf.empty/ ~/hdp13

HDP 2.1:

Create a folder name hdp21 in your working folder and now use SCP command to copy configuration files as below over password less SSH:

$ scp -r root@192.168.21.186:/etc/hadoop/conf/ ~/hdp21

Adding correct JAVA_HOME to imported Hadoop configuration hadoop-env.sh

Now visit to your hdp13 or hdp21 folder and edit hadoop-env.sh file with correct JAVA_HOME as below:

# The java implementation to use. Required.
# export JAVA_HOME=/usr/jdk/jdk1.6.0_31  
export JAVA_HOME=`/usr/libexec/java_home -v 1.7`

Adding correct HDP Hostname into local machine hosts entries:

Now you would need to add Hortonworks HDP hostnames into your local machines hosts file. On Mac OSX you would need to edit /private/etc/hosts file to add the following:

#HDP 2.1
192.168.21.186 sandbox.hortonworks.com
#HDP 1.3
192.168.21.187 sandbox

 

Once added make sure you can ping the hosts by name as below:

$ ping sandbox

PING sandbox (192.168.21.187): 56 data bytes
64 bytes from 192.168.21.187: icmp_seq=0 ttl=64 time=0.461 ms

And for HDP 2.1

$ ping sandbox.hortonworks.com
PING sandbox.hortonworks.com (192.168.21.186): 56 data bytes
64 bytes from 192.168.21.186: icmp_seq=0 ttl=64 time=0.420 ms

Access Hadoop Runtime on Remote Machine from Hadoop commands (or API) at Local Machine:

Now using local machine Hadoop runtime you can connect to Hadoop at HDP VM as below:

HDP 1.3

$ ./hadoop –config /Users/avkashchauhan/hdp13/conf.empty fs -ls /
Found 4 items
drwxr-xr-x – hdfs hdfs 0 2013-05-30 10:34 /apps
drwx—— – mapred hdfs 0 2014-06-05 03:54 /mapred
drwxrwxrwx – hdfs hdfs 0 2014-06-05 06:19 /tmp
drwxr-xr-x – hdfs hdfs 0 2013-06-10 14:39 /user

HDP 2.1

$ ./hadoop –config /Users/avkashchauhan/hdp21/conf fs -ls /
Found 6 items
drwxrwxrwx – yarn hadoop 0 2014-04-21 07:21 /app-logs
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:23 /apps
drwxr-xr-x – mapred hdfs 0 2014-04-21 07:16 /mapred
drwxr-xr-x – hdfs hdfs 0 2014-04-21 07:16 /mr-history
drwxrwxrwx – hdfs hdfs 0 2014-05-23 11:35 /tmp
drwxr-xr-x – hdfs hdfs 0 2014-05-23 11:35 /user

If you are using Hadoop API then you can pass the CONF file path to API and have access to Hadoop runtime.

 

Apache Ambari 1.6.0 support with Blueprints is released

What is Ambari Blueprint?

Ambari Blueprint allows an operator to instantiate a Hadoop cluster quickly—and reuse the blueprint to replicate cluster instances elsewhere, for example, as development and test clusters, staging clusters, performance testing clusters, or co-located clusters.

Release URL: http://www.apache.org/dyn/closer.cgi/ambari/ambari-1.6.0

blueprints

 

 

 

 

 

 

Ambari Blueprint supports PostgreSQL:

Ambari now extends database support for Ambari DB, Hive and Oozie to include PostgreSQL. This means that Ambari now provides support for the key databases used in enterprises today: PostgreSQL, MySQL and Oracle. The PostgreSQL configuration choice is reflected in this database support matrix.

More Links:

 

Content Source: http://hortonworks.com/blog/apache-ambari-1-6-0-released-blueprints

Hadoop 2.4.0 release (helpful links)

Kudos to Hadoop community as Hadoop 2.4.0 release is available for everyone to consume. A small list of improvements in HDFS, MapReduce along with overall framework are as below but not limited to:

Hadoop 2.4.0 Highlights:

  • HDFS:
    • Full HTTPS support
    • ACL Supported HDFS, allows easier access to Apache Sentry-managed data by components using it
    • Native supported Rolling upgrades in HDFS
    • HDFS FSImage using protocol-buffers for smoother operational upgrades
  • YARN:
    • ResourceManager HA Automatic Failover
    •  YARN Timeline Server PREVIEW for storing and serving generic application history

Hadoop 2.4.0 Release Notes:

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/releasenotes.html

Hadoop 2.4.0 Source download:

http://apache.mirrors.tds.net/hadoop/common/hadoop-2.4.0/hadoop-2.4.0-src.tar.gz

Hadoop 2.4.0 Binary download:

http://apache.mirrors.tds.net/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz

Setting up Pivotal Hadoop (PivotalHD 1.1 Community Edition) Cluster in CentOS 6.5

Download Pivotal HD Package

http://bitcast-a.v1.o1.sjc1.bitgravity.com/greenplum/pivotal-sw/pivotalhd_community_1.1.tar.gz

The package consist of 3 tarball package:

  • PHD-1.1.0.0-76.tar.gz
  • PCC-2.1.0-460.x86_64.tar.gz
  • PHDTools-1.1.0.0-97.tar.gz

Untar above package and start with PCC (Pivotal Command Center)

Install Pivotal Command Center:

$tar -zxvf PCC-2.1.0-460.x86_64.tar.gz
$PHDCE1.1/PCC-2.1.0-460/install

Log in using  newly created user gpadmin:
$  su – gpadmin
$  sudo cp /root/.bashrc .
$  sudo cp /root/.bash_profile .
$  sudo cp /root/.bash_logout .
$  sudo cp /root/.cshrc .
$  sudo cp /root/.tcshrc .

Logout and re-login:
$ exit
$ su – gpadmin

Make sure you have alias set for your localhost:
$  vi /etc/hosts
xx.xx.xx.xx pivotal-master.hadoopbox.com  pivotal-master
$ service network restart
$ ping pivotal-master
$ ping pivotal-master.hadoopbox.com
Now we will use Pivotal HD Package, so lets untar it into PHD-1.1.0.0-76 folder.
Expand PHD* package and then import it:
$  icm_client import -s PHD-1.1.0.0-76/

Get cluster specific configuration:
$ icm_client fetch-template -o ~/ClusterConfigDir

Edit cluster configuration based on your domain details:
$  vi ~/ClusterConfigDir/clusterConfig.xml
Replace all host.yourdomain.com to your_Domainname. Somehow having .  {dot} in domain name is not accepted.
Also select the services you would want to install. you must need base 3 services hdfs, YARN, and Zookeeper in PivotalHD:

<services>hdfs,yarn,zookeeper</services> <!– hbase,hive,hawq,gpxf,pig,mahout</services> –>

Create password-less SSH configuration:

$ ssh-keygen -t rsa
$  cd .ssh
$  cat id_rsa.pub >> authorized_keys
$  cat authorized_keys
$  chmod 700 $HOME && chmod 700 ~/.ssh && chmod 600 ~/.ssh/*

[gpadmin@pivotal-master ~]$ icm_client deploy -c ClusterConfigDir
Please enter the root password for the cluster nodes:
PCC creates a gpadmin user on the newly added cluster nodes (if any). Please enter a non-empty password to be used for the gpadmin user:
Verifying input
Starting install
Running scan hosts
[RESULT] The following hosts do not meet PHD prerequisites: [ pivotal-master.hadoopbox.com ] Details…

Host: pivotal-master.hadoopbox.com
Status: [FAILED]
[ERROR] Please verify supported OS type and version. Supported OS: RHEL6.1, RHEL6.2, RHEL6.3, RHEL6.4, CentOS6.1, CentOS6.2, CentOS6.3, CentOS6.4
[OK] SELinux is disabled
[OK] sshpass installed
[OK] gpadmin user exists
[OK] gpadmin user has sudo privilege
[OK] .ssh directory and authorized_keys have proper permission
[OK] Puppet version 2.7.20 installed
[OK] Ruby version 1.9.3 installed
[OK] Facter rpm version 1.6.17 installed
[OK] Admin node is reachable from host using FQDN and admin hostname.
[OK] umask is set to 0002.
[OK] nc and postgresql-devel packages are installed or available in the yum repo
[OK] iptables: Firewall is not running.
[OK] Time difference between clocks within acceptable threshold
[OK] Host FQDN is configured correctly
[OK] Host has proper java version.
ERROR: Fetching status of the cluster failed
HTTP Error 500: Server Error
Cluster ID: 4

Because I have Cent OS 6.5 so lets edit /etc/centos-release file to let Pivotal installation know CentOS 6.4.
[gpadmin@pivotal-master ~]$ cat /etc/centos-release
CentOS release 6.5 (Final)
[gpadmin@pivotal-master ~]$ sudo mv /etc/centos-release /etc/centos-release-orig
[gpadmin@pivotal-master ~]$ sudo cp /etc/centos-release-orig /etc/centos-release
[gpadmin@pivotal-master ~]$ sudo vi /etc/centos-release

CentOS release 6.4 (Final)  <— Edit to look like I am using CentOS 6.4 even when I have CentOS 6.5

[gpadmin@pivotal-master ~]$ icm_client deploy -c ClusterConfigDir
Please enter the root password for the cluster nodes:
PCC creates a gpadmin user on the newly added cluster nodes (if any). Please enter a non-empty password to be used for the gpadmin user:
Verifying input
Starting install
[====================================================================================================] 100%
Results:
pivotal-master… [Success]
Details at /var/log/gphd/gphdmgr/
Cluster ID: 5

$ cat /var/log/gphd/gphdmgr/GPHDClusterInstaller_1392419546.log
Updating Option : TimeOut
Current Value   : 60
TimeOut=”180″
pivotal-master : Push Succeeded
pivotal-master : Push Succeeded
pivotal-master : Push Succeeded
pivotal-master : Push Succeeded
pivotal-master : Push Succeeded
pivotal-master : Push Succeeded
[INFO] Deployment ID: 1392419546
[INFO] Private key path : /var/lib/puppet/ssl-icm/private_keys/ssl-icm-1392419546.pem
[INFO] Signed cert path : /var/lib/puppet/ssl-icm/ca/signed/ssl-icm-1392419546.pem
[INFO] CA cert path : /var/lib/puppet/ssl-icm/certs/ca.pem
hostlist: pivotal-master
running: massh /tmp/tmp.jaDiwkIFMH bombed uname -n
sync cmd sudo python ~gpadmin/GPHDNodeInstaller.py –server=pivotal-master.hadoopbox.com –certname=ssl-icm-1392419546 –logfile=/tmp/GPHDNodeInstaller_1392419546.log –sync –username=gpadmin
[INFO] Deploying batch with hosts [‘pivotal-master’]
writing host list to file /tmp/tmp.43okqQH7Ji
[INFO] All hosts succeeded.

$ icm_client list
Fetching installed clusters
Installed Clusters:
Cluster ID: 5     Cluster Name: pivotal-master     PHD Version: 2.0     Status: installed

$ icm_client start -l pivotal-master
Starting services
Starting cluster
[====================================================================================================] 100%
Results:
pivotal-master… [Success]
Details at /var/log/gphd/gphdmgr/

Check HDFS:
$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   – mapred hadoop          0 2014-02-14 15:19 /mapred
drwxrwxrwx   – hdfs   hadoop          0 2014-02-14 15:19 /tmp
drwxrwxrwx   – hdfs   hadoop          0 2014-02-14 15:20 /user
drwxr-xr-x   – hdfs   hadoop          0 2014-02-14 15:20 /yarn

Now open Browser @ https://your_domain_name:5443/
Username/Password – gpadmin/gpadmin

 

Pivotal Command Center Service Status:
$ service commander status
commander (pid  2238) is running…