Two never-miss very informative tutorials on Driverless AI

1. Automatic Feature Engineering with Driverless AI:

Dmitry Larko, Kaggle Grandmaster and Senior Data Scientist at H2O.ai, will showcase what he is doing with feature engineering, how he is doing it, and why it is important in the machine learning realm. He will delve into the workings of H2O.ai’s new product, Driverless AI, whose automatic feature engineering increases the accuracy of models and frees up approximately 80% of the data practitioners’ time – thus enabling them to draw actionable insights from the models built by Driverless AI.  You will see:

  • Overview of feature engineering
  • Real-time demonstration of feature engineering examples
  • Interpretation and reason codes of final models

2. Machine Learning Interpretability with Driverless AI:

In this video, Patrick showcases several approaches beyond the error measures and assessment plots typically used to interpret deep learning and machine learning models and results. Wherever possible, interpretability approaches are deconstructed into more basic components suitable for human storytelling: complexity, scope, understanding, and trust. You will see:

  • Data visualization techniques for representing high-degree interactions and nuanced data structures.
  • Contemporary linear model variants that incorporate machine learning and are appropriate for use in regulated industry.
  • Cutting edge approaches for explaining extremely complex deep learning and machine learning models.

Thats it, enjoy!!

 

Advertisements

Getting individual metrics from H2O model in Python

You can get some of the individual model metrics for your model based on training and/or validation data. Here is the code snippet:

Note: I am creating a test data frame to run H2O Deep Learning algorithm and then showing how to collect individual model metrics based on training and/or validation data below.

import h2o
h2o.init(strict_version_check= False , port = 54345)
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
model = H2ODeepLearningEstimator()
rows = [[1,2,3,4,0], [2,1,2,4,1], [2,1,4,2,1], [0,1,2,34,1], [2,3,4,1,0]] * 50
fr = h2o.H2OFrame(rows)
X = fr.col_names[0:4]

## Classification Model
fr[4] = fr[4].asfactor()
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('logloss', model.logloss(valid = False))
print('Accuracy', model.accuracy(valid = False))
print('AUC', model.auc(valid = False))
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))
print('Error', model.error(valid = False))
print('MCC', model.mcc(valid = False))

## Regression Model
fr = h2o.H2OFrame(rows)
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))
 

Note: As I did not pass validation frame thats why I set valid = False to get training metrics. If you have passed validation metrics then you can set valid = True to get validation metrics as well.

If you want to see what is inside model object you can look at the json object as below:

model.get_params()

Thats it, enjoy!!

 

Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model:
To collect model metrics  for training use the following:
val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train)
Now you can access model AUC (_auc object) as below:
Note: _auc object has array of thresholds, and then for each threshold it has fps and tps
(use tab completion to list them all)
scala> trainMetrics._auc.
_auc   _gini      _n       _p     _tps      buildCM   defaultCM    defaultThreshold   forCriterion   frozenType   pr_auc   readExternal   reloadFromBytes   tn             tp      writeExternal   
_fps   _max_idx   _nBins   _ths   asBytes   clone     defaultErr   fn                 fp             maxF1        read     readJSON       threshold         toJsonString   write   writeJSON
In the above AUC object:
_fps  =  false positives
_tps  =  true positives
_ths  =  threshold values
_p    =  actual trues
_n    =  actual false
Now you can use individual ROC specific values as below to recreate ROC:
trainMetrics._auc._fps
trainMetrics._auc._tps
trainMetrics._auc._ths
To print the whole array in the terminal for inspection, you just need the following:
val dd = trainMetrics._auc._fps
println(dd.mkString(" "))
You can access true positives and true negatives as below where actual trues and actual false are defined as below:
_p    =  actual trues

_n    =  actual false
scala> trainMetrics._auc._n
res42: Double = 2979.0

scala> trainMetrics._auc._p
res43: Double = 1711.0
Thats it, enjoy!!

Multinomial classification example in Scala and Deep Learning with H2O

Here is a sample for multinomial classification problem using H2O Deep Learning algorithm and iris data set in Scala language.

The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4.

import org.apache.spark.h2o._
import water.support.SparkContextSupport.addFiles
import org.apache.spark.SparkFiles
import java.io.File
import water.support.{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport}
import water.Key
import _root_.hex.deeplearning.DeepLearningModel
import _root_.hex.ModelMetricsMultinomial


val hc = H2OContext.getOrCreate(sc)
import hc._
import hc.implicits._

addFiles(sc, "/Users/avkashchauhan/smalldata/iris/iris.csv")
val irisData = new H2OFrame(new File(SparkFiles.get("iris.csv")))

val ratios = Array[Double](0.8)
val keys = Array[String]("train.hex", "valid.hex")
val frs = H2OFrameSupport.split(irisData, keys, ratios)
val (train, valid) = (frs(0), frs(1))

def buildDLModel(train: Frame, valid: Frame, response: String,
 epochs: Int = 10, l1: Double = 0.001, l2: Double = 0.0,
 hidden: Array[Int] = Array[Int](200, 200))
 (implicit h2oContext: H2OContext): DeepLearningModel = {
 import h2oContext.implicits._
 // Build a model
 import _root_.hex.deeplearning.DeepLearning
 import _root_.hex.deeplearning.DeepLearningModel.DeepLearningParameters
 val dlParams = new DeepLearningParameters()
 dlParams._train = train
 dlParams._valid = valid
 dlParams._response_column = response
 dlParams._epochs = epochs
 dlParams._l1 = l1
 dlParams._hidden = hidden
 // Create a job
 val dl = new DeepLearning(dlParams, Key.make("dlModel.hex"))
 dl.trainModel.get
}


// Note: The response column name is C5 here so passing:
val dlModel = buildDLModel(train, valid, 'C5)(hc)

// Collect model metrics and evaluate model quality
val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsMultinomial](dlModel, train)
val validMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsMultinomial](dlModel, valid)
println(trainMetrics.rmse)
println(validMetrics.rmse)
println(trainMetrics.mse)
println(validMetrics.mse)
println(trainMetrics.r2)
println(validMetrics.r2)

Thats it, enjoy!!

Just upgraded Tensorflow 1.0.1 and Keras 2.0.1

$ pip install –upgrade keras –user

Collecting keras
 Downloading Keras-2.0.1.tar.gz (192kB)
 100% |████████████████████████████████| 194kB 2.9MB/s
Requirement already up-to-date: theano in ./.local/lib/python2.7/site-packages (from keras)
Requirement already up-to-date: pyyaml in ./.local/lib/python2.7/site-packages (from keras)
Requirement already up-to-date: six in ./.local/lib/python2.7/site-packages (from keras)
Requirement already up-to-date: numpy>=1.7.1 in /usr/local/lib/python2.7/dist-packages (from theano->keras)
Collecting scipy>=0.11 (from theano->keras)
 Downloading scipy-0.19.0-cp27-cp27mu-manylinux1_x86_64.whl (45.0MB)
 100% |████████████████████████████████| 45.0MB 34kB/s
Building wheels for collected packages: keras
 Running setup.py bdist_wheel for keras ... done
 Stored in directory: /home/avkash/.cache/pip/wheels/fa/15/f9/57473734e407749529bf55e6b5038640dc7279d5718b2c368a
Successfully built keras
Installing collected packages: keras, scipy
 Found existing installation: Keras 1.2.2
 Uninstalling Keras-1.2.2:
 Successfully uninstalled Keras-1.2.2
Successfully installed keras-2.0.1 scipy-0.19.0

$ pip install –upgrade tensorflow-gpu –user

Collecting tensorflow-gpu
 Downloading tensorflow_gpu-1.0.1-cp27-cp27mu-manylinux1_x86_64.whl (94.8MB)
 100% |████████████████████████████████| 94.8MB 16kB/s
Requirement already up-to-date: mock>=2.0.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow-gpu)
Requirement already up-to-date: numpy>=1.11.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow-gpu)
Requirement already up-to-date: protobuf>=3.1.0 in /usr/local/lib/python2.7/dist-packages (from tensorflow-gpu)
Requirement already up-to-date: wheel in /usr/lib/python2.7/dist-packages (from tensorflow-gpu)
Requirement already up-to-date: six>=1.10.0 in ./.local/lib/python2.7/site-packages (from tensorflow-gpu)
Requirement already up-to-date: funcsigs>=1; python_version < "3.3" in /usr/local/lib/python2.7/dist-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already up-to-date: pbr>=0.11 in ./.local/lib/python2.7/site-packages (from mock>=2.0.0->tensorflow-gpu)
Requirement already up-to-date: setuptools in ./.local/lib/python2.7/site-packages (from protobuf>=3.1.0->tensorflow-gpu)
Requirement already up-to-date: appdirs>=1.4.0 in ./.local/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-gpu)
Requirement already up-to-date: packaging>=16.8 in ./.local/lib/python2.7/site-packages (from setuptools->protobuf>=3.1.0->tensorflow-gpu)
Requirement already up-to-date: pyparsing in ./.local/lib/python2.7/site-packages (from packaging>=16.8->setuptools->protobuf>=3.1.0->tensorflow-gpu)
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-1.0.1

$ python -c ‘import keras as tf;print tf.version

Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
2.0.1

$ python -c ‘import tensorflow as tf;print tf.version

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
1.0.1

Deep Learning session from Google Next 2017

TensorFlow and Deep Learning without a PhD:

With TensorFlow, deep machine learning transitions from an area of research to mainstream software engineering. In this video, Martin Gorner demonstrates how to construct and train a neural network that recognizes handwritten digits. Along the way, he’ll describe some “tricks of the trade” used in neural network design, and finally, he’ll bring the recognition accuracy of his model above 99%.

Part 1:

Part 2:

 

Version mismatch error during H2O initialization

When you initialize  H2O in R or python as below:

> h2o.init()

You may see the following error:

Error in h2o.init(nthreads = -1) : Version mismatch! H2O is running version 3.11.0.99999 but h2o-R package is version 3.10.4.1.
This is a developer build, please contact your developer

You are seeing this error is because R is trying to connect an existing running H2O instance instead of the starting a new instance based on your R package.

If you look for H2O running in your environment you will find one. You can also try to following command to see running h2o process

$ ps -ef | grep h2o

Now if you turn off running H2O instance, and re-run the above command you will see a new instance of H2O is started.

The other option is to disable the strict version check of a running H2O instance and just connect with it as below (You can disable version checking on both R and python):

> h2o.init(strict_version_check = FALSE)