Setting up jupyter notebook server as service in Ubuntu 16.04

Step 1: Verify the jupyter notebook location:

$ ll /home/avkash/.local/bin/jupyter-notebook
-rwxrwxr-x 1 avkash avkash 222 Jun 4 10:00 /home/avkash/.local/bin/jupyter-notebook*

Step 2: Configure your jupyter notebook with password and ip address as needed and make sure where it exist. We will use this file as configuration for jupyter as service.

jupyter config: /home/avkash/.jupyter/

Step 3: Create a file name jupyter.service as below and save it into /usr/lib/systemd/system/ folder.

$ cat /usr/lib/systemd/system/jupyter.service
Description=Jupyter Notebook

# Step 1 and Step 2 details are here..
# ------------------------------------
ExecStart=/home/avkash/.local/bin/jupyter-notebook --config=/home/avkash/.jupyter/


Step 4: Now enabled the service as below:

$ sudo systemctl enable jupyter.service

Step 5: Now enabled the service as below:

$ sudo systemctl daemon-reload

Step 6: Now enabled the service as below:

$ sudo systemctl restart jupyter.service

The service is started now. You can test it as below:

$ systemctl -a | grep jupyter
 jupyter.service      loaded active running Jupyter Notebook

Thats it, enjoy!!




Using RESTful API to get POJO and MOJO models in H2O


CURL API for Listing Models:


CURL API for Listing specific POJO Model:


List Specific MOJO Model:


Here is an example:

curl -X GET "http://localhost:54323/3/Models"
curl -X GET "http://localhost:54323/3/Models/deeplearning_model" >> NAME_IT

curl -X GET "http://localhost:54323/3/Models/deeplearning_model" >>
curl -X GET "http://localhost:54323/3/Models/glm_model/mojo" >

Thats it, enjoy!!

Installing ipython 5.0 (lower then 6.0) compatible with python 2.6/2.7

It is possible that you may need to install some python library or component with your python 2.6 or 2.7 environment. If those components need IPython then you

For example, with python 2.7.x when you try to install jupyter as below:

$ pip install jupyter --user

You will get the error as below:

Using cached ipython-6.0.0.tar.gz
 Complete output from command python egg_info:

IPython 6.0+ does not support Python 2.6, 2.7, 3.0, 3.1, or 3.2.
 When using Python 2.7, please install IPython 5.x LTS Long Term Support version.
 Beginning with IPython 6.0, Python 3.3 and above is required.

See IPython `README.rst` file for more information:

Python sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) detected.

To solve this problem you just need to install IPython 5.x (instead of 6.0 which is pulled as default when installing jupyter or independently ipython.

Here is the way you can install IPython 5.x version:

$ pip install IPython==5.0 --user
$ pip install jupyter --user

Thats it, enjoy!!

Thats it, enjoy!!



Installing R on Redhat 7 (EC2 RHEL 7)

Check you machine version:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Now  lets updated the RPM repo details:

$ sudo su -c 'rpm -Uvh'
$ sudo yum update

Make sure all dependencies are installed individually:

$ wget
$ sudo yum localinstall blas-devel-3.4.2-5.el7.x86_64.rpm

$ wget
$ sudo yum localinstall blas-3.4.2-5.el7.x86_64.rpm

$ wget
$ sudo yum localinstall lapack-devel-3.4.2-5.el7.x86_64.rpm

$ wget
$ sudo yum install texinfo-tex-5.1-4.el7.x86_64.rpm

$ wget
$ sudo yum install texlive-epsf-svn21461.2.7.4-38.el7.noarch.rpm

Finally install R now:

$ sudo yum install R

Thats it.

Binomial classification example in Scala and GBM with H2O

Here is a sample for binomial classification problem using H2O GBM algorithm using Credit Card data set in Scala language.

The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4.

import org.apache.spark.h2o._
import org.apache.spark.SparkFiles
import{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport}
import water.Key
import _root_.hex.glm.GLMModel
import _root_.hex.ModelMetricsBinomial

val hc = H2OContext.getOrCreate(sc)
import hc._
import hc.implicits._

addFiles(sc, "/Users/avkashchauhan/learn/deepwater/credit_card_clients.csv")
val creditCardData = new H2OFrame(new File(SparkFiles.get("credit_card_clients.csv")))

val ratios = Array[Double](0.8)
val keys = Array[String]("train.hex", "valid.hex")
val frs = H2OFrameSupport.split(creditCardData, keys, ratios)
val (train, valid) = (frs(0), frs(1))

def buildGLMModel(train: Frame, valid: Frame, response: String)
 (implicit h2oContext: H2OContext): GLMModel = {
 import _root_.hex.glm.GLMModel.GLMParameters.Family
 import _root_.hex.glm.GLM
 import _root_.hex.glm.GLMModel.GLMParameters
 val glmParams = new GLMParameters(Family.binomial)
 glmParams._train = train
 glmParams._valid = valid
 glmParams._response_column = response
 glmParams._alpha = Array[Double](0.5)
 val glm = new GLM(glmParams, Key.make("glmModel.hex"))
 //val glmModel = glm.trainModel().get()

val glmModel = buildGLMModel(train, valid, 'default_payment_next_month)(hc)

// Collect model metrics and evaluate model quality
val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train)
val validMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, valid)

// Preduction
addFiles(sc, "/Users/avkashchauhan/learn/deepwater/credit_card_predict.csv")
val creditPredictData = new H2OFrame(new File(SparkFiles.get("credit_card_predict.csv")))

val predictionFrame = glmModel.score(creditPredictData)
var predictonResults = asRDD[DoubleHolder](predictionFrame)

Thats it, enjoy!!

Using H2O with Microsoft R Open on Linux Machine


Microsoft R Open Page:

Ubuntu Download link:

$ wget
$ tar -xvf microsoft-r-open-3.3.3.tar.gz
$ cd microsoft-r-open
$ sudo bash

Installation will be done into the following folder:

$ ll /usr/lib64/microsoft-r/3.3/lib64/R/bin/

drwxr-xr-x 11 root root 4096 Apr 20 15:28 ./
drwxr-xr-x 4 root root 4096 Apr 20 15:28 ../
drwxr-xr-x 3 root root 4096 Apr 20 15:28 backup/
drwxr-xr-x 3 root root 4096 Apr 20 15:28 bin/
-rw-r--r-- 1 root root 18011 Mar 28 13:35 COPYING
drwxr-xr-x 4 root root 4096 Apr 20 15:28 doc/
drwxr-xr-x 2 root root 4096 Apr 20 15:28 etc/
drwxr-xr-x 3 root root 4096 Apr 20 15:28 include/
drwxr-xr-x 2 root root 4096 Apr 20 15:28 lib/
drwxr-xr-x 47 root root 4096 Apr 20 15:28 library/
drwxr-xr-x 2 root root 4096 Apr 20 15:28 modules/
drwxr-xr-x 13 root root 4096 Apr 20 15:28 share/
-rw-r--r-- 1 root root 46 Mar 28 13:35 SVN-REVISION

Note If you already have R installed in the machine you may see Microsoft R link is not created and previous R is still available at /usr/bin/R. If that is the case you may need to create the symbolic link as below.

Creating symbolic link:

$ sudo ln -s /usr/lib64/microsoft-r/3.3/lib64/R/bin/R /usr/bin/MSR

Launching R:

You just need to do the following:

$ R

If you have created the symbolic link then use the following


Installing RCurl which is must to have for H2O:

> install.packages(“RCurl”)

Now installing H2O latest from the H2O Download link (

> install.packages(“h2o”, type = “source”, repos = (c(“;))) :

Once H2O is installed you can use it. Here is the full execution log:



R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Microsoft R Open 3.3.3
The enhanced R distribution from Microsoft
Microsoft packages Copyright (C) 2017 Microsoft Corporation
Using the Intel MKL for parallel mathematical computing(using 16 cores).
Default CRAN mirror snapshot taken on 2017-03-15.

> library(h2o)
Your next step is to start H2O:
    > h2o.init()
For H2O package documentation, ask for help:
    > ??h2o
After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit
Attaching package: ‘h2o’
The following objects are masked from ‘package:stats’:
    cor, sd, var
The following objects are masked from ‘package:base’:
    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric, log,
    log10, log1p, log2, round, signif, trunc

> h2o.init()
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

Starting H2O JVM and connecting: .. Connection successful!

R is connected to the H2O cluster:
    H2O cluster uptime: 2 seconds 536 milliseconds
    H2O cluster version: 3.10.4.
    H2O cluster version age: 22 hours and 35 minutes
    H2O cluster name: H2O_started_from_R_avkash_tco537
    H2O cluster total nodes: 1
    H2O cluster total memory: 26.67 GB
    H2O cluster total cores: 32
    H2O cluster allowed cores: 2
    H2O cluster healthy: TRUE
    H2O Connection ip: localhost
    H2O Connection port: 54321
    H2O Connection proxy: NA
    H2O Internal Security: FALSE
    R Version: R version 3.3.3 (2017-03-06)

Note: As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)

> h2o.clusterStatus()

Cluster name: H2O_started_from_R_avkash_tco537
Cluster size: 1
Cluster is locked

h2o healthy last_ping num_cpus sys_load
1 localhost/ TRUE 1.492729e+12 32 0.88
  mem_value_size free_mem pojo_mem swap_mem free_disk max_disk pid

1 0 28537698304 93668352 0 47189065728 235825790976 25530
  num_keys tcps_active open_fds rpcs_active


Plotting scoring history from H2O model in python

Once you build a model with H2O the scoring history can be see in the mode details or model metrics table. If validation is enabled then scoring and validation history is also visible. You can see these metrics in the FLOW UI however if you are using python shell then you may want to plot training and/or validation metrics by your self and this is what we will do next.

To get the scoring history from the model in python you can just try the following:

import pandas as pd
sh = mymodel.score_history()
sh = pd.DataFrame(sh)


The results are as below:

Index([u'', u'timestamp', u'duration', u'number_of_trees', u'training_rmse',
       u'training_logloss', u'training_auc', u'training_lift',

The model’s scoring history table looks like as below:

Screen Shot 2017-04-11 at 3.43.14 PM

Next we can plot a graph between training_logloss and training_auc as below:

import matplotlib.pyplot as plt
%matplotlib inline 
# plot training logloss and auc
sh.plot(x='number_of_trees', y = ['training_auc', 'training_logloss'])

The results are as below:

Screen Shot 2017-04-11 at 3.44.32 PM

Thats is, enjoy!!