Installing or upgrading python3.6 in Ubuntu 16.04

Download python 3.6.1 and install as below:

tar xvf Python-3.6.1.tgz
cd Python-3.6.1
./configure --enable-optimizations
make -j8
# If you want to keep previous version user altinstall
sudo make altinstall
# if you want to replace previous version use install
# sudo make install

Testing python3.6

$ python3.6

Once it is working check its launching path:

$ which python3.6

Now you just need to change the links for python3 binary as below:

$ sudo ln -s /usr/local/bin/python3.6 /usr/local/python3

Now test python3 for the final:

$ python3
Python 3.6.1 (default, Jun 8 2017, 16:11:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

That’s it, enjoy!!

Saving H2O models from R/Python API in Hadoop Environment

When you are using H2O in clustered environment i.e. Hadoop the machine could be different where h2o.savemodel() is trying to write the model and thats why you see the error “No such file or directory”. If you just give the path i.e. /tmp and visit the machine ID where H2O connection is initiated from R, you will see the model stored there.
Here is a good example to understand it better:
Step [1] Starting Hadoop driver in EC2 environment as below:
[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o- -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
Open H2O Flow in your web browser:  <=== H2O is started.
Note: Above you could see that hadoop command is ran on ip address however the node where H2O server is shown as
Step [2] Connect R client with H2O
> h2o.init(ip = "", port = 54323, strict_version_check = FALSE)
Note: I have used the ip address as shown above to connect with existing H2O cluster. However the machine where I am running R client is different as its IP address is
Step [3]: Saving H2O model:
h2o.saveModel(my.glm, path = "/tmp", force = TRUE)
So when I am saving the mode it is saved at machine even when the R client was running at
ec2-user@ip-10-0-65-248 ~]$ ll /tmp/GLM*
-rw-r--r-- 1 yarn hadoop 90391 Jun 2 20:02 /tmp/GLM_model_R_1496447892009_1
So you need to make sure you have access to a folder where H2O service is running or you can save model at HDFS something similar to as below:
h2o.saveModel(my.glm, path = "hdfs://", force = TRUE)

Thats it, enjoy!!

Using RESTful API to get POJO and MOJO models in H2O


CURL API for Listing Models:


CURL API for Listing specific POJO Model:


List Specific MOJO Model:


Here is an example:

curl -X GET "http://localhost:54323/3/Models"
curl -X GET "http://localhost:54323/3/Models/deeplearning_model" >> NAME_IT

curl -X GET "http://localhost:54323/3/Models/deeplearning_model" >>
curl -X GET "http://localhost:54323/3/Models/glm_model/mojo" >

Thats it, enjoy!!

Installing ipython 5.0 (lower then 6.0) compatible with python 2.6/2.7

It is possible that you may need to install some python library or component with your python 2.6 or 2.7 environment. If those components need IPython then you

For example, with python 2.7.x when you try to install jupyter as below:

$ pip install jupyter --user

You will get the error as below:

Using cached ipython-6.0.0.tar.gz
 Complete output from command python egg_info:

IPython 6.0+ does not support Python 2.6, 2.7, 3.0, 3.1, or 3.2.
 When using Python 2.7, please install IPython 5.x LTS Long Term Support version.
 Beginning with IPython 6.0, Python 3.3 and above is required.

See IPython `README.rst` file for more information:

Python sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) detected.

To solve this problem you just need to install IPython 5.x (instead of 6.0 which is pulled as default when installing jupyter or independently ipython.

Here is the way you can install IPython 5.x version:

$ pip install IPython==5.0 --user
$ pip install jupyter --user

Thats it, enjoy!!

Thats it, enjoy!!



Creating Partial Dependency Plot (PDP) in H2O

Starting from H2O H2O added partial dependency plot which has the Java backend to do the mutli-scoring of the dataset with the model. This makes creating PDP much faster.

To get PDP in H2O you must need Model, and the original data set used to generate mode. Here are few ways to create PDP:

If you want to generate PDP on a single column:

response = h2o.predict(model, data.pdp[, column_name])
To generate PDP on the original data set:
response = h2o.predict(model, data.pdp)
If you want to build PDP directly from Model and dataset without using PDP API, you can the following code:
model = prostate.gbm
column_name = "AGE"
data.pdp = data.hex
bins = unique(h2o.quantile(data.hex[, column_name], probs = seq(0.05,1,0.05)) )
mean_responses = c()

for(bin in bins ){
  data.pdp[, column_name] = bin
  response = h2o.predict(model, data.pdp[, column_name])
  mean_response = mean(response[,ncol(response)])
  mean_responses = c(mean_responses, mean_response)

pdp_manual = data.frame(AGE = bins, mean_response = mean_responses)
plot(pdp_manual, type = "l")
Thats it, enjoy!!

Filtering H2O data frame on multiple fields of date and int type

Lets create an H2O frame using h2o.create_frame API:

df = h2o.create_frame(time_fraction = .1,rows=10, cols = 10)

Above will create a frame of 10 rows and 10 columns and based on time_fraction values 0.1 (1 out of 10 provided columns) will be date/time columns. The data frame looks as below:

Screen Shot 2017-04-27 at 1.20.12 PM

Here are few example filtering scripts:

df1 = df[ (df['C4'] > 0) & (df['C7'] < 10)]
df2 = df[ (df['C4'] > 0) & (df['C7'] < 10)   & (df['C9'] > datetime.datetime(2000,1,1))  ]
df2 = df[ ((df['C4'] > 0) | (df['C7'] < 10)) & (df['C9'] > datetime.datetime(2000,1,1)) ]

and the screenshot:

Screen Shot 2017-04-27 at 1.19.09 PM

Thats it, enjoy!!

Building high order polynomials with GLM for higher accuracy

Sometimes when building GLM models, you would like to configure GLM to search for higher order polynomial of the features .

The reason you may have to do is that, you may have strong predictors for a model and going for high order polynomial of predictors you will get higher accuracy.

With H2O, you can create higher order polynomials as below:

  • Look for  ‘interactions’ parameter in GLM model.
  • In the interaction parameters add  list of predictor columns to interact.
When model will be build, all pairwise combinations will be computed for this list. Following is a working sample:
boston = h2o.import_file("")
predictors = boston.columns[:-1]
response = "medv"
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
interactions_list = ['crim', 'dis']
boston_glm = H2OGeneralizedLinearEstimator(interactions = interactions_list)
boston_glm.train(x = predictors, y = response,training_frame = boston)
To explore interactions among categorical variables please do the following:
Thats all, enjoy!!