Conda Python 3.5 and OpenCV 3 with Matplotlib and QT5 backend

As title suggests lets get to work:

Create the Conda Environment with Python 3.5

$ conda create -n python35 python=35
$ conda activate python35

Inside the conda environment we need to install pyqt5, pyside, pyobj-core, pyobjc-framework-cocoa packages:

Installing QT5 required packages inside Conda:

$ conda install -c dsdale24 pyqt5
$ conda install -c conda-forge pyside
## Note: I couldn;t find these with conda on conda-forge so used pip
$ pip install pyobjc-core
$ pip install pyobjc-framework-cocoa

Verifying Python 3.5:

$ python

Python 3.5.4 |Anaconda, Inc.| (default, Feb 19 2018, 11:51:41)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

Checking backend used by matplotlib:

import matplotlib

If you see ‘MacOSX‘ means it is using MacOSX backend and we need to change it to qt as below:

Changing matplotlib backend to use QT5:


This will result as qt5agg backend to be used with CV2.

Sample code to show image using OpenCV3

Trying a sample OpenCV3 code to show image:

import cv2
image = cv2.imread("/work/src/github/aiprojects/avkash_cv/matrix.png")
import matplotlib.pyplot as plt

This is how image rendered with QT5 backend:


That’s it, enjoy!!








Compile OpenCV3 with Python3.5 Conda environment on OSX Sierra

As title suggests, lets get is going…

Create the Conda Environment with Python 3.5

$ conda create -n python35 python=35
$ conda activate python35

Verify the Conda Environment with python 3.5

$ python 
Python 2.7.14 |Anaconda custom (64-bit)| (default, Dec 7 2017, 11:07:58)

Now we will install tensorflow latest which will install lots of required dependency I really needed:

$ conda install -c conda-forge tensorflow

Python run time environment and Folder

Now we will look to confirm the python path

$ which python

Now we need to find out where the Python.h header file is which will be used as the values for PYTHON3_INCLUDE_DIR later:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/include/python3.5m/Python.h

Now we need to find out where the libpython3.5m.dylib library file is which will be used as the values forPYTHON3_LIBRARY later:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/lib/libpython3.5m.dylib

Lets clone the OpenCV master repo and opencv_contrib at the same base folder and as below:

$ git clone
$ git clone

Lets create the build environment:

$ cd opencv
$ mkdir build
$ cd build

Now Lets configure the build environment first:

 -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
 -D PYTHON3_LIBRARY=/Users/avkashchauhan/anaconda3/envs/python35/lib/libpython3.5m.dylib \
 -D PYTHON3_INCLUDE_DIR=/Users/avkashchauhan/anaconda3/envs/python35/include/python3.5m/ \
 -D PYTHON3_EXECUTABLE=/Users/avkashchauhan/anaconda3/envs/python35/bin/python \
 -D BUILD_opencv_python2=OFF \
 -D BUILD_opencv_python3=ON \

The configuration shows following key settings:

-- Found PythonInterp: /Users/avkashchauhan/anaconda3/bin/python2.7 (found suitable version "2.7.14", minimum required is "2.7")
-- Could NOT find PythonLibs: Found unsuitable version "2.7.10", but required is exact version "2.7.14" (found /usr/lib/libpython2.7.dylib)
-- Found PythonInterp: /Users/avkashchauhan/anaconda3/envs/python35/bin/python (found suitable version "3.5.4", minimum required is "3.4")
-- Found PythonLibs: YYY (Required is exact version "3.5.4")
-- Python 3:
-- Interpreter: /Users/avkashchauhan/anaconda3/envs/python35/bin/python (ver 3.5.4)
-- Libraries: YYY
-- numpy: /Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages/numpy/core/include (ver 1.12.1)
-- packages path: lib/python3.5/site-packages
-- Python (for build): /Users/avkashchauhan/anaconda3/bin/python2.7
-- Pylint: /Users/avkashchauhan/anaconda3/bin/pylint (ver: 1.8.2, checks: 116)
General configuration for OpenCV 3.4.1-dev =====================================
-- Version control: 3.4.1-26-g667f5b655

Building the OpenCV code:

Now lets build the code:

$ make -j4

The successful build output end with the following console log:

Scanning dependencies of target example_face_facemark_demo_aam
[ 99%] Building CXX object modules/face/CMakeFiles/example_face_facemark_demo_aam.dir/samples/facemark_demo_aam.cpp.o
[ 99%] Linking CXX executable ../../bin/example_face_facemark_lbf_fitting
[ 99%] Built target example_face_facemark_lbf_fitting
[ 99%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_facemark_lbf.cpp.o
[ 99%] Linking CXX executable ../../bin/example_face_facerec_save_load
[ 99%] Built target example_face_facerec_save_load
[ 99%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_loadsave.cpp.o
[100%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_main.cpp.o
[100%] Linking CXX executable ../../bin/example_face_facemark_demo_aam
[100%] Built target example_face_facemark_demo_aam
[100%] Linking CXX executable ../../bin/opencv_test_face
[100%] Built target opencv_test_face

Lets install is locally:

To install the final library try the following:

$ sudo make install

Once install is completed you will confirm the build output as below:

$ ll /usr/local/lib/python3.5/site-packages/

Copying final openCV library to Python 3.5 site package:

As we know that Python 3.5 Conda environment folder site-packages is here:


So we will copy to final to Python 3.5 Conda environment folder site-packages as as below:

$ cp /usr/local/lib/python3.5/site-packages/ 

Confirm it:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages/

Verification OpenCV with Python 3.5:

Now Verify the OpenCV with Python 3.5 on Conda Environment:

$ python 
Python 3.5.4 |Anaconda, Inc.| (default, Feb 19 2018, 11:51:41)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__

Now lets run OpenCV with as example:

import numpy as np
import cv2

# Load an color image in grayscale
img = cv2.imread('/work/src/github/aiprojects/avkash_cv/test_image.png', 0)
    # Display the resulting frame 
    cv2.imshow("preview", img) 
    if cv2.waitKey(1) & 0xFF == ord('q'): 
# When everything done, release the capture

Thats it, enjoy!!


Machine Learning adoption for any organization

At this point there is no doubt that any organization can take the advantage of machine learning by applying machine learning into their business process. The significance of machine learning application will depend on how it is applied and what kind of problem you as an organization trying to solve with machine learning. The results are also depend on the experience of your data scientists and software engineer along with the adoption of technology.

In this article we will learn how machine learning development life cycle really looks like and how any organization can build a team to solve their business problem with machine learning. Lets get us started with the following image in mind:

Screen Shot 2018-02-18 at 1.15.52 PM

As you can see above the machine learning process is a continuous process of extracting data from variety of sources then feeding into machine learning engines which generates the model. These models are plugged into business process to produce the results. The results from the models are feed into the process to solve business problems.  These models can produce results independently as well at the edge depending on their usage.

At this point the critical question is to understand what a machine learning development life cycle really look like. What kind of talent is really required to pull it off? What these teams really do while building and applying machine learning?

We will get the answers to above questions as we progress further. If we look at machine learning development life cycle image below we will see the following paradigms:

  1. Collecting data from various resources
  2. After data collecting, making it machine learning ready
  3. The machine learning ready data is feed into “building machine learning” process where a data science heavy team is working on data to produce results.

Screen Shot 2018-02-18 at 1.16.01 PM

Above you can see the the building machine learning process is very data science heavy work however applying machine is mainly the software engineering process. You can use the above understanding to figure out the technical resources needed to implement end to end machine learning pipeline for your organization.

The next question comes in our mind is the separation of building machine learning and applying machine learning. how these two process are difference? What is the end results of machine learning process and how software engineering can apply its out?

Looking at the image below we can see the product of “building machine learning” process is the final or leader model which an enterprise or business and use as the final product. This model is ready to produce results as needed.

Screen Shot 2018-02-18 at 1.16.12 PM

The model can be applied to various consumer, enterprise and industrial use cases to provide edge level intelligence, or in process intelligence where model results are fed into another process. Sometimes the model is fed into another machine learning process to generate further results.

Once we have understood the significance of key individuals in end to end machine learning process, the question in our mind if what the key individual do in day to day process? How to they really engage into the process of building machine learning? What kind of tools and technology they adopt or create to solve organization business problem?

To understand the kind of work data scientists will be doing while building machine learning, we can see their main focus to use and apply as many as machine learning engines along with various algorithms to solve the specific problem. Sometime they create something brand new to solve the problem they have in their hand as there is nothing available, or sometimes they just need to improve an available solution.

Screen Shot 2018-02-18 at 1.16.21 PM
The above image puts together the conceptual idea of various engines, could be used by the team of data scientists in any organization to accomplish their task.

The role of software engineering is critical in overall machine learning pipeline. They help data science process to speed up and refine the process to generate faster results while applying the software engineering methods top of data science.

The image below explains how software engineers can expedite the work of data scientists by create fully automated machine learning system which perform the repetitive tasks of data scientists in full automated fashion. At this point data scientists are open to use their time to solve newer problems and just keep an eye of the automated system to make sure it is working as their expectation.

Screen Shot 2018-02-18 at 1.16.31 PM


Various organization i.e. Google (i.e. CloudML), H2O (i.e. AutoML) has created automated machine learning software which can be utilized by any organization. There are open sources packages also available i.e. Auto-SKLearn, TPOT.

Any organization can follow the above details to adopt machine learning into their organization and generate expected results.

Helpful Articles:

Thank you, all the very best!




Two never-miss very informative tutorials on Driverless AI

1. Automatic Feature Engineering with Driverless AI:

Dmitry Larko, Kaggle Grandmaster and Senior Data Scientist at, will showcase what he is doing with feature engineering, how he is doing it, and why it is important in the machine learning realm. He will delve into the workings of’s new product, Driverless AI, whose automatic feature engineering increases the accuracy of models and frees up approximately 80% of the data practitioners’ time – thus enabling them to draw actionable insights from the models built by Driverless AI.  You will see:

  • Overview of feature engineering
  • Real-time demonstration of feature engineering examples
  • Interpretation and reason codes of final models

2. Machine Learning Interpretability with Driverless AI:

In this video, Patrick showcases several approaches beyond the error measures and assessment plots typically used to interpret deep learning and machine learning models and results. Wherever possible, interpretability approaches are deconstructed into more basic components suitable for human storytelling: complexity, scope, understanding, and trust. You will see:

  • Data visualization techniques for representing high-degree interactions and nuanced data structures.
  • Contemporary linear model variants that incorporate machine learning and are appropriate for use in regulated industry.
  • Cutting edge approaches for explaining extremely complex deep learning and machine learning models.

Thats it, enjoy!!


Getting individual metrics from H2O model in Python

You can get some of the individual model metrics for your model based on training and/or validation data. Here is the code snippet:

Note: I am creating a test data frame to run H2O Deep Learning algorithm and then showing how to collect individual model metrics based on training and/or validation data below.

import h2o
h2o.init(strict_version_check= False , port = 54345)
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
model = H2ODeepLearningEstimator()
rows = [[1,2,3,4,0], [2,1,2,4,1], [2,1,4,2,1], [0,1,2,34,1], [2,3,4,1,0]] * 50
fr = h2o.H2OFrame(rows)
X = fr.col_names[0:4]

## Classification Model
fr[4] = fr[4].asfactor()
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('logloss', model.logloss(valid = False))
print('Accuracy', model.accuracy(valid = False))
print('AUC', model.auc(valid = False))
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))
print('Error', model.error(valid = False))
print('MCC', model.mcc(valid = False))

## Regression Model
fr = h2o.H2OFrame(rows)
model.train(x=X, y="C5", training_frame=fr)
print('Model Type:', model.type)
print('R2', model.r2(valid = False))
print('RMSE', model.rmse(valid = False))

Note: As I did not pass validation frame thats why I set valid = False to get training metrics. If you have passed validation metrics then you can set valid = True to get validation metrics as well.

If you want to see what is inside model object you can look at the json object as below:


Thats it, enjoy!!


Generating ROC curve in SCALA from H2O binary classification models

You can use the following blog to built a binomial classification  GLM model:
To collect model metrics  for training use the following:
val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsBinomial](glmModel, train)
Now you can access model AUC (_auc object) as below:
Note: _auc object has array of thresholds, and then for each threshold it has fps and tps
(use tab completion to list them all)
scala> trainMetrics._auc.
_auc   _gini      _n       _p     _tps      buildCM   defaultCM    defaultThreshold   forCriterion   frozenType   pr_auc   readExternal   reloadFromBytes   tn             tp      writeExternal   
_fps   _max_idx   _nBins   _ths   asBytes   clone     defaultErr   fn                 fp             maxF1        read     readJSON       threshold         toJsonString   write   writeJSON
In the above AUC object:
_fps  =  false positives
_tps  =  true positives
_ths  =  threshold values
_p    =  actual trues
_n    =  actual false
Now you can use individual ROC specific values as below to recreate ROC:
To print the whole array in the terminal for inspection, you just need the following:
val dd = trainMetrics._auc._fps
println(dd.mkString(" "))
You can access true positives and true negatives as below where actual trues and actual false are defined as below:
_p    =  actual trues

_n    =  actual false
scala> trainMetrics._auc._n
res42: Double = 2979.0

scala> trainMetrics._auc._p
res43: Double = 1711.0
Thats it, enjoy!!

Multinomial classification example in Scala and Deep Learning with H2O

Here is a sample for multinomial classification problem using H2O Deep Learning algorithm and iris data set in Scala language.

The following sample is for multinomial classification problem. This sample is created using Spark 2.1.0 with Sparkling Water 2.1.4.

import org.apache.spark.h2o._
import org.apache.spark.SparkFiles
import{H2OFrameSupport, SparkContextSupport, ModelMetricsSupport}
import water.Key
import _root_.hex.deeplearning.DeepLearningModel
import _root_.hex.ModelMetricsMultinomial

val hc = H2OContext.getOrCreate(sc)
import hc._
import hc.implicits._

addFiles(sc, "/Users/avkashchauhan/smalldata/iris/iris.csv")
val irisData = new H2OFrame(new File(SparkFiles.get("iris.csv")))

val ratios = Array[Double](0.8)
val keys = Array[String]("train.hex", "valid.hex")
val frs = H2OFrameSupport.split(irisData, keys, ratios)
val (train, valid) = (frs(0), frs(1))

def buildDLModel(train: Frame, valid: Frame, response: String,
 epochs: Int = 10, l1: Double = 0.001, l2: Double = 0.0,
 hidden: Array[Int] = Array[Int](200, 200))
 (implicit h2oContext: H2OContext): DeepLearningModel = {
 import h2oContext.implicits._
 // Build a model
 import _root_.hex.deeplearning.DeepLearning
 import _root_.hex.deeplearning.DeepLearningModel.DeepLearningParameters
 val dlParams = new DeepLearningParameters()
 dlParams._train = train
 dlParams._valid = valid
 dlParams._response_column = response
 dlParams._epochs = epochs
 dlParams._l1 = l1
 dlParams._hidden = hidden
 // Create a job
 val dl = new DeepLearning(dlParams, Key.make("dlModel.hex"))

// Note: The response column name is C5 here so passing:
val dlModel = buildDLModel(train, valid, 'C5)(hc)

// Collect model metrics and evaluate model quality
val trainMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsMultinomial](dlModel, train)
val validMetrics = ModelMetricsSupport.modelMetrics[ModelMetricsMultinomial](dlModel, valid)

Thats it, enjoy!!