Superset and Jupyter notebooks on AWS as Service

Jupyter Notebook (In EC2 Instance):

The following script is written to run jupyter notebook as a server inside the AWS EC2 instance which you can access from your desktop/laptop if EC2 instance is accessible from your machine:

  • $ conda activate python37
  • $ jupyter notebook –generate-config
    • This will create the jupyter_notebook_config.py configuration file inside your working folder i.e. /home/<username>/.jupyter/
  • $ jupyter notebook password  
    • You can set the password here
  • $ vi /home/centos/.jupyter/jupyter_notebook_config.py
    • Edit the following 2 lines 
    •   c.NotebookApp.ip = ‘0.0.0.0’
    •   c.NotebookApp.port = 8888
  • $ jupyter notebook

Apache Superset (In EC2 Instance):

That’s all.

@avkashchauhan

Advertisement

Installing Apache Superset into CentOS 7 with Python 3.7

Following are the starter commands to install superset:

  • $ python –version
    • Python 3.7.5
  • $ pip install superset

Possible Errors:

You might be hitting any or all of the following error(s):

Running setup.py install for python-geohash … error
ERROR: Command errored out with exit status 1:

building ‘_geohash’ extension
……
unable to execute ‘gcc’: No such file or directory
error: command ‘gcc’ failed with exit status 1

gcc: error trying to exec ‘cc1plus’: execvp: No such file or directory
error: command ‘gcc’ failed with exit status 1

Look for:

  • $ gcc –version <= You must have gcc installed
  • $ locate cc1plus <= You must have cc1plus install

Install the required libraries and tools:

If any of the above components are missing, you need to install a few required libraries:

  • $ sudo yum install mlocate <= For locate command
  • $ sudo updatedb  <= Update for mlocate
  • $ sudo yum install gcc <=For gcc if you don’t have
  • $ sudo yum install gcc-c++  <== For cc1plus if you dont have

Verify the following again:

  • $ gcc –version
  • $ locate cc1plus
    • /usr/libexec/gcc/x86_64-redhat-linux/4.8.2/cc1plus

Note:

  • If you could locate cc1plus properly however still getting the error, try the following
    • sudo ln -s /usr/libexec/gcc/x86_64-redhat-linux/4.8.2/cc1plus /usr/local/bin/
  • Try installing again

Final Installation:

Now you can install  superset as below:

  • $ pip install superset
    • Python 3.7.5
      Flask 1.1.1
      Werkzeug 0.16.0
  • $ superset db upgrade
  • $ export FLASK_APP=superset
  • $ flask fab create-admin
    • Recognized Database Authentications.
      Admin User admin created.
  • $ superset init
  • $ superset run -p 8080 –with-threads –reload –debugger

 

That’s all.

@avkashchauhan

Adding MapBox token with SuperSet

To visualize geographic data with superset you will need to get the MapBox token first and then apply that MapBox token with Superset configuration to consume it.

Please visit https://www.mapbox.com/ to request the MapBox token as needed.

Update your shell configuration to support Superset:

What you need:

  • Superset Home
    • If you have installed from pip/pip3 get the site-packages
    • If you have installed from GitHub clone, use the GitHub clone home
  • Superset Config file
    • Create a file name superset_config.py and place it into your $HOME/.superset/ folder
  • Python path includes superset config location with python binary

Update following into your .bash_profile or .zshrc:

export SUPERSET_HOME=/Users/avkashchauhan/anaconda3/lib/python3.7/site-packages/superset
export SUPERSET_CONFIG_PATH=$HOME/.superset/superset_config.py
export PYTHONPATH=/Users/avkashchauhan/anaconda3/bin/python:/Users/avkashchauhan/.superset:$PYTHONPATH

Minimal superset_config.py configuration:

#---------------------------------------------------------
# Superset specific config
#---------------------------------------------------------
ROW_LIMIT = 50000

SQLALCHEMY_DATABASE_URI = 'sqlite:////Users/avkashchauhan/.superset/superset.db'

MAPBOX_API_KEY = 'YOUR_TOKEN_HERE'

Start your superset instance:

$ superset run -p 8080 –with-threads –reload –debugger

Please verify the logs to make sure superset_config.py was loaded and read without any error. The successful logs will look like as below:

Loaded your LOCAL configuration at [/Users/avkashchauhan/.superset/superset_config.py]

If there are errors you will get an error (or more) just after the above line similar to as below:

ERROR:root:Failed to import config for SUPERSET_CONFIG_PATH=/Users/avkashchauhan/.superset/superset_config.py

IF your Sqlite instance is not configured correctly you will get error as below:

2019-11-06 14:25:51,074:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (sqlite3.OperationalError) unable to open database file
(Background on this error at: http://sqlalche.me/e/e3q8)

The successful superset_config.py loading will return with no error as below:

Loaded your LOCAL configuration at [/Users/avkashchauhan/.superset/superset_config.py]
2019-11-06 17:33:16,588:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: off
2019-11-06 17:33:17,294:INFO:werkzeug: * Running on http://127.0.0.1:8080/ (Press CTRL+C to quit)
2019-11-06 17:33:17,306:INFO:werkzeug: * Restarting with fsevents reloader
Loaded your LOCAL configuration at [/Users/avkashchauhan/.superset/superset_config.py]
2019-11-06 17:33:18,644:INFO:root:Configured event logger of type <class 'superset.utils.log.DBEventLogger'>
2019-11-06 17:33:19,345:WARNING:werkzeug: * Debugger is active!
2019-11-06 17:33:19,353:INFO:werkzeug: * Debugger PIN: 134-113-136

Now if you visualize any dataset with geographic columns i.e. longitude and latitude the Superset will be able to show the data properly as below:

Screen Shot 2019-11-06 at 5.49.53 PM

That’s all for now.

@avkashchauhan

Compile OpenCV3 with Python3.5 Conda environment on OSX Sierra

As title suggests, lets get is going…

Create the Conda Environment with Python 3.5

$ conda create -n python35 python=35
$ conda activate python35

Verify the Conda Environment with python 3.5

$ python 
Python 2.7.14 |Anaconda custom (64-bit)| (default, Dec 7 2017, 11:07:58)

Now we will install tensorflow latest which will install lots of required dependency I really needed:

$ conda install -c conda-forge tensorflow

Python run time environment and Folder

Now we will look to confirm the python path

$ which python
/Users/avkashchauhan/anaconda3/bin/python

Now we need to find out where the Python.h header file is which will be used as the values for PYTHON3_INCLUDE_DIR later:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/include/python3.5m/Python.h

Now we need to find out where the libpython3.5m.dylib library file is which will be used as the values forPYTHON3_LIBRARY later:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/lib/libpython3.5m.dylib

Lets clone the OpenCV master repo and opencv_contrib at the same base folder and as below:

$ git clone https://github.com/opencv/opencv
$ git clone https://github.com/opencv/opencv_contrib

Lets create the build environment:

$ cd opencv
$ mkdir build
$ cd build

Now Lets configure the build environment first:

$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
 -D CMAKE_INSTALL_PREFIX=/usr/local \
 -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \
 -D PYTHON3_LIBRARY=/Users/avkashchauhan/anaconda3/envs/python35/lib/libpython3.5m.dylib \
 -D PYTHON3_INCLUDE_DIR=/Users/avkashchauhan/anaconda3/envs/python35/include/python3.5m/ \
 -D PYTHON3_EXECUTABLE=/Users/avkashchauhan/anaconda3/envs/python35/bin/python \
 -D BUILD_opencv_python2=OFF \
 -D BUILD_opencv_python3=ON \
 -D INSTALL_PYTHON_EXAMPLES=ON \
 -D INSTALL_C_EXAMPLES=OFF \
 -D BUILD_EXAMPLES=ON ..

The configuration shows following key settings:

......
......
-- Found PythonInterp: /Users/avkashchauhan/anaconda3/bin/python2.7 (found suitable version "2.7.14", minimum required is "2.7")
-- Could NOT find PythonLibs: Found unsuitable version "2.7.10", but required is exact version "2.7.14" (found /usr/lib/libpython2.7.dylib)
-- Found PythonInterp: /Users/avkashchauhan/anaconda3/envs/python35/bin/python (found suitable version "3.5.4", minimum required is "3.4")
-- Found PythonLibs: YYY (Required is exact version "3.5.4")
....
-- Python 3:
-- Interpreter: /Users/avkashchauhan/anaconda3/envs/python35/bin/python (ver 3.5.4)
-- Libraries: YYY
-- numpy: /Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages/numpy/core/include (ver 1.12.1)
-- packages path: lib/python3.5/site-packages
--
-- Python (for build): /Users/avkashchauhan/anaconda3/bin/python2.7
-- Pylint: /Users/avkashchauhan/anaconda3/bin/pylint (ver: 1.8.2, checks: 116)
--
General configuration for OpenCV 3.4.1-dev =====================================
-- Version control: 3.4.1-26-g667f5b655

Building the OpenCV code:

Now lets build the code:

$ make -j4

The successful build output end with the following console log:

Scanning dependencies of target example_face_facemark_demo_aam
[ 99%] Building CXX object modules/face/CMakeFiles/example_face_facemark_demo_aam.dir/samples/facemark_demo_aam.cpp.o
[ 99%] Linking CXX executable ../../bin/example_face_facemark_lbf_fitting
[ 99%] Built target example_face_facemark_lbf_fitting
[ 99%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_facemark_lbf.cpp.o
[ 99%] Linking CXX executable ../../bin/example_face_facerec_save_load
[ 99%] Built target example_face_facerec_save_load
[ 99%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_loadsave.cpp.o
[100%] Building CXX object modules/face/CMakeFiles/opencv_test_face.dir/test/test_main.cpp.o
[100%] Linking CXX executable ../../bin/example_face_facemark_demo_aam
[100%] Built target example_face_facemark_demo_aam
[100%] Linking CXX executable ../../bin/opencv_test_face
[100%] Built target opencv_test_face

Lets install is locally:

To install the final library try the following:

$ sudo make install

Once install is completed you will confirm the build output as below:

$ ll /usr/local/lib/python3.5/site-packages/cv2.cpython-35m-darwin.so

Copying final openCV library to Python 3.5 site package:

As we know that Python 3.5 Conda environment folder site-packages is here:

/Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages

So we will copy to final cv2.cpython-35m-darwin.so to Python 3.5 Conda environment folder site-packages as cv2.so as below:

$ cp /usr/local/lib/python3.5/site-packages/cv2.cpython-35m-darwin.so 
     /Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages/cv2.so

Confirm it:

$ ll /Users/avkashchauhan/anaconda3/envs/python35/lib/python3.5/site-packages/cv2.so

Verification OpenCV with Python 3.5:

Now Verify the OpenCV with Python 3.5 on Conda Environment:

$ python 
Python 3.5.4 |Anaconda, Inc.| (default, Feb 19 2018, 11:51:41)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'3.4.1-dev'
>>>

Now lets run OpenCV with as example:

import numpy as np
import cv2

# Load an color image in grayscale
img = cv2.imread('/work/src/github/aiprojects/avkash_cv/test_image.png', 0)
while(True): 
    cv2.startWindowThread() 
    cv2.namedWindow("preview")
    # Display the resulting frame 
    cv2.imshow("preview", img) 
    if cv2.waitKey(1) & 0xFF == ord('q'): 
        break
# When everything done, release the capture
cv2.destroyAllWindows()

Thats it, enjoy!!

@avkashchauhan

Saving H2O model object as text locally

Sometimes you may want to store the H2O model object as text to local file system. In this example I will show you how you can save H2O model object to local disk as simple text content. You can get full working jupyter notebook for this example here from my Github.

Based on my experience the following example works fine with python 2.7.12 and python 3.4. I also found that the H2O model object tables were not saved to text file from jupyter notebook however when I ran the same code form command line into python shell, all the content was written perfectly.

Lets build an H2O GBM model using the public PROSTATE dataset (The following script is full working script which will generate the GBM binomial model):

import h2o
h2o.init()

local_url = "https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate.csv"
df = h2o.import_file(local_url)

y = "CAPSULE"
feature_names = df.col_names
feature_names.remove(y) 
df[y] = df[y].asfactor()

df_train, df_valid = df.split_frame(ratios=[0.9])
print(df_train.shape)
print(df_valid.shape)

prostate_gbm = H2OGradientBoostingEstimator(model_id = "prostate_gbm",
 ntrees=1000,
 learn_rate=0.5,
 max_depth=20,
 stopping_tolerance=0.001,
 stopping_rounds=2,
 score_each_iteration=True)

prostate_gbm.train(x = feature_names, y = y, training_frame=df_train, validation_frame=df_valid)
prostate_gbm

Now we will save the model details to the disk as below:

old_target = sys.stdout
f = open('/Users/avkashchauhan/Downloads/model_output.txt', 'w')
sys.stdout = f

Lets see the content of the local file we have just created in the above step (It is empty):

!cat /Users/avkashchauhan/Downloads/model_output.txt

Now we will launch the following commands which will fill the standard output buffer with the model details as text:

print("Model summary>>> model_object.show()")
prostate_gbm.show()

Now we will push the standard output buffer to the text file which is created locally:

sys.stdout = old_target

Now we will check back the local file contents and this time you will see that the output of above command is written into the file:

!cat /Users/avkashchauhan/Downloads/model_output.txt

You will see the command output stored into the local text file as below:

Model summary>>> model_object.show()
Model Details
=============
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  prostate_gbm


ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.036289343297
RMSE: 0.190497620187
LogLoss: 0.170007804527
Mean Per-Class Error: 0.0160045361428
AUC: 0.998865964296
Gini: 0.997731928592
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.487417363665: 
Maximum Metrics: Maximum metrics at their respective thresholds

Gains/Lift Table: Avg response rate: 40.36 %



ModelMetricsBinomial: gbm
** Reported on validation data. **

MSE: 0.161786079676
RMSE: 0.402226403505
LogLoss: 0.483923658542
Mean Per-Class Error: 0.174208144796
AUC: 0.871040723982
Gini: 0.742081447964
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.205076283533: 
Maximum Metrics: Maximum metrics at their respective thresholds

Gains/Lift Table: Avg response rate: 39.53 %


Scoring History: 
Variable Importances:

Note: If you are thinking what “!” sign does here, so it is used here to run a linux shell command (in this case “cat”  is the linux command) inside jupyter cell.

Thats it, enjoy!!

 

Installing or upgrading python3.6 in Ubuntu 16.04

Download python 3.6.1 and install as below:

wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz
tar xvf Python-3.6.1.tgz
cd Python-3.6.1
./configure --enable-optimizations
make -j8
# If you want to keep previous version user altinstall
sudo make altinstall
# if you want to replace previous version use install
# sudo make install

Testing python3.6

$ python3.6

Once it is working check its launching path:

$ which python3.6
/usr/local/bin/python3.6

Now you just need to change the links for python3 binary as below:

$ sudo ln -s /usr/local/bin/python3.6 /usr/local/python3

Now test python3 for the final:

$ python3
Python 3.6.1 (default, Jun 8 2017, 16:11:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

That’s it, enjoy!!

Installing ipython 5.0 (lower then 6.0) compatible with python 2.6/2.7

It is possible that you may need to install some python library or component with your python 2.6 or 2.7 environment. If those components need IPython then you

For example, with python 2.7.x when you try to install jupyter as below:

$ pip install jupyter --user

You will get the error as below:

Using cached ipython-6.0.0.tar.gz
 Complete output from command python setup.py egg_info:

IPython 6.0+ does not support Python 2.6, 2.7, 3.0, 3.1, or 3.2.
 When using Python 2.7, please install IPython 5.x LTS Long Term Support version.
 Beginning with IPython 6.0, Python 3.3 and above is required.

See IPython `README.rst` file for more information:

https://github.com/ipython/ipython/blob/master/README.rst

Python sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) detected.

To solve this problem you just need to install IPython 5.x (instead of 6.0 which is pulled as default when installing jupyter or independently ipython.

Here is the way you can install IPython 5.x version:

$ pip install IPython==5.0 --user
$ pip install jupyter --user

Thats it, enjoy!!

Thats it, enjoy!!

 

 

Splitting h2o data frame based on date time value

Sometime we may need to split the data frame based on date time values i.e. one split is above certain date and another split is after certain date.

Here is an example of the python code on how to split it:

import datetime
timedata = h2o.import_file("/Users/avkashchauhan/Downloads/date-data.csv")
timedata.shape
date_before_data = timedata[timedata['date'] < datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_after_data = timedata[timedata['date'] >= datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_before_data.shape
date_after_data.shape

If you decide to split one piece of data frame and then add one of the split to previous data frame you can do the following:

part1, part2 = date_after_data.split_frame(ratios=[0.5])
final_data = date_before_data.rbind(part2)

Note the CSV file contents are as below:

id date
1 9/1/2015
2 9/2/2015
3 9/3/2015
4 9/4/2015
5 9/5/2015
6 9/6/2015
7 9/7/2015
8 9/8/2015
9 9/9/2015
10 9/10/2015
11 12/1/2015
12 12/2/2015
13 12/3/2015
14 12/4/2015
15 12/5/2015
16 12/6/2015
17 12/7/2015
18 12/8/2015
19 12/9/2015
20 12/10/2015

Thats it, enjoy!!

Union of two different H2O data frames in python and R

We have first data frame as below:

C1 C2 C3 C4
10 20 30 40
3 4 5 6
5 7 8 9
12 3 55 10

And then we have second data frame as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 33 44
3 4 5 6 11 22
5 7 8 9 90 100
12 3 55 10 33 44

If we just try to add these two data frame blindly as below:

final = df2.rbind(df1)

We will get the following error:

H2OValueError: Cannot row-bind a dataframe with 6 columns to a data frame with 4 columns: the columns must match

So we need to merge two data sets of different columns we need to instrument our datasets to meet the rbind need.  First we will add remaining columns from “df2” to “df1” as below:

df1['C10'] = 0
df1['C20'] = 0

The updated data frame looks like as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 0 0
3 4 5 6 0 0
5 7 8 9 0 0
12 3 55 10 0 0

Now we will do rbind with “df2” to “df1” as below:

df1 = df1.rbind(df2)

Now “df1” looks like as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 0 0
3 4 5 6 0 0
5 7 8 9 0 0
12 3 55 10 0 0
10 20 30 40 33 44
3 4 5 6 11 22
5 7 8 9 90 100
12 3 55 10 33 44

If you are using R you just need to do the following to add new columns into your first data frame:

df1$C10 = 0
df1$C20 = 0

You must make sure the number of columns match before doing rbind and number of rows match before doing cbind.

Thats it, enjoy!!