Categorical Encoding, One Hot Encoding and why use it?

What is categorical encoding?

In the data science categorical values are encoded as enumerator so the algorithms can use them numerically when processing the data and generating the relationship with other features used for learning.

Name Age Zip Code Salary
Jim 43 94404 45000
Jon 37 94407 80000
Merry 36 94404 65000
Tim 42 94403 75000
Hailey 29 94407 60000

In above example the Zip Code is not a numeric values instead each number represents a certain area. So using Zip code as number will not create a relationship among other features such as age or salary however if we encode it to categorial then relationship among other features would be define properly. So we use Zip Code feature as categorical or enum when we feed for machine learning algorithm.

As string or character feature should be set to categorical or enum as well to generalize the relationship among features. In the above dataset if we add another feature name “Sex” as below then using “sex” feature as categorical will improve the relationship among other features.

Name Age Zip Code Sex Salary
Jim 43 94404 M 45000
Jon 37 94407 M 80000
Merry 36 94404 F 65000
Tim 42 94403 M 75000
Hailey 29 94407 F 60000

So after encoding Zip Code an Sex features as enums both features will look like as below:

Name Age Zip Code Sex Salary
Jim 43 1 1 45000
Jon 37 2 1 80000
Merry 36 1 0 65000
Tim 42 3 1 75000
Hailey 29 2 0 60000

As Name feature will not help us any ways to related Age, Zip Code and Sex so we can drop it and stick with Age, Zip Code and Sex to understand Salary first and then predict the same Salary for the new values. So the input data set will look like as below:

Age Zip Code Sex
43 1 1
37 2 1
36 1 0
42 3 1
29 2 0

Above you can see that all the data is in numeric format and it is ready to be processed by algorithms to create a relationship among it to first learn and then predict.

What is One Hot Encoding?

In the above example you can see that the values i.e. Male or Female are part of feature name “Sex” so their exposure with other features is not that rich or in depth. What if Male and Female be features like Age or Zip Code? In that case the relationship for being Male or Female with other data set will be much higher.. Using one hot encoding for a specific feature provides necessary & proper representation of the distinct elements for that feature, which helps improved learning.

One Hot Encoding does exactly the same. It takes distinct values from the feature and convert into a feature itself to improve the relationship with overall data. So if we choose One Hot Encoding to the “Sex” feature the dataset will look like as below:

Age Zip Code M F Salary
43 1 1 0 45000
37 2 1 0 80000
36 1 0 1 65000
42 3 1 0 75000
29 2 0 1 60000

If we decide to set One Hot Encoding to Zip Code as well then our data set will look like as below:

Age 94404 94407 94403 M F Salary
43 1 0 0 1 0 45000
37 0 1 0 1 0 80000
36 1 0 0 0 1 65000
42 0 0 1 1 0 75000
29 0 1 0 0 1 60000

So above you can see that each values has significant representation and a deep relationship with the other values. One hot encoding is also called as one-of-K scheme.

One Hot encoding can use either dense or sparse implementation when it creates the feature from the encoded values.

Why Use it?

There are several good reasons to use One Hot Encoding in the data.

As you can see, using One Hot encoding, sparsity of data is included into original data set which is more memory friendly and improve learning time if algorithm is designed to handle data sparsity properly.

Other Resources:

Please visit the following link to see the One-Hot-Encoding implementation in scikit-learn:

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

For in depth feature engineering please visit the following slides from HJ Van Veen:

A great way to probe personal traits through simple questions

Source: https://medium.com/the-coffeelicious/questions-that-truly-reveal-someones-personality-and-capabilities-ecf9f37fc4e2

stock-vector-personality-chart-with-keywords-and-icons-348777452

I really like these questions which could open the personal window of anyone if asked properly…

I want to give full credit to the author Tiffany Sun for composing the list below.

Enjoy!

  1. If you could have superpowers, would you use it for good or for evil?
  2. How old would you be if you didn’t know how old you are?
  3. Would you accept the gift of reading other people’s minds if it meant you could never turn it off?
  4. If the average human life span was 40 years, how would you live your life differently?
  5. Do you think crying is a sign of weakness or strength?
  6. Would you rather be able to eat as much as you want with no weight gain, or require only 3 hours of sleep a day?
  7. If you had to choose to live without one of your 5 senses, which one would you give up?
  8. In what ways are you the same as your childhood self?
  9. If you had your own TV network, what would it be about?
  10. If you’re in a bad mood, do you prefer to be left alone or have someone cheer you up?
  11. Would you rather know without a doubt the purpose and direction of your life or never have to worry about money for the rest of your life?
  12. If you could master one skill you don’t have right now, what would it be?
  13. What song typifies the last 24 hours of your life?
  14. What words would you pass to your childhood self?
  15. If you had to do it over again, what would you study in school?
  16. If you could have any accent, which one would it be?
  17. Would you rather be married in an arranged marriage or spend the rest of your life single?
  18. If you could be someone of the opposite sex for a day, what would be the first thing you do?
  19. Would you rather have an extra hour everyday or have $40 given to you free and clear everyday?
  20. If you were to be stranded on a deserted island with one other person, who would it be?
  21. What would you do differently if you knew nobody would judge you?
  22. Would you rather spend 48 straight hours in a public restroom or spend the next 2 months taking only public transportation?
  23. What did you learn in school that has proven to be the least useful?
  24. If you had an extra hour every day, what would you do with it?
  25. Would you rather lose your sense of taste and small or lose all of your hair?
  26. If you could invent something, what would it be and why?
  27. Would you rather have more than 5 friends or fewer than 5 friends?
  28. What stands between you and happiness?
  29. If today were to be your last day in your country, what would you want to do?
  30. Would you rather lose all of your old memories, or never be able to make new ones?
  31. What was the last thing you got for free?
  32. Would you rather be extremely attractive or be married to someone who is extremely attractive?
  33. What do you want to be remembered for?
  34. Would you rather have $50,000 free and clear or $1,000,000 that is illegal?
  35. If you could trade lives with one of your friends, who would it be?
  36. Would you rather discover something great and share it? Or discover something evil and prevent it?
  37. What movie deserves a sequel?
  38. If you could see 24 hours into the future, what would you be doing?

 

 

View at Medium.com

Handling various errors when installing Tensorflow

The problem happens if protobuf for python is older then 3.1.0 and TF is older too. I had exact problem as below:

$ python

Python 2.7.10 (default, Jul 30 2016, 19:40:32)
>>> from tensorflow.tools.tfprof import tfprof_log_pb2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named tfprof

This is how i solved it:

Updated setuptools to latest 32.x or above version (I did this because my TF install was failing to update setuptools to version 32.x):

pip install update --user setuptools

After that I installed TF as below:

$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl
$ sudo pip install --upgrade $TF_BINARY_URL

You will note that TF 12.1 installs the following:

Collecting tensorflow==0.12.1 from https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.12.1-py2-none-any.whl
Collecting numpy>=1.11.0 (from tensorflow==0.12.1)
Collecting protobuf>=3.1.0 (from tensorflow==0.12.1)
Collecting setuptools (from protobuf>=3.1.0->tensorflow==0.12.1)

After successful TF Install:

Successfully installed protobuf-3.1.0.post1 tensorflow-0.12.1

I tried the very first command as below:

$ python

Python 2.7.10 (default, Jul 30 2016, 19:40:32)
>>> from tensorflow.tools.tfprof import tfprof_log_pb2
>>> tfprof_log_pb2
<module 'tensorflow.tools.tfprof.tfprof_log_pb2' from '/Library/Python/2.7/site-packages/tensorflow/tools/tfprof/tfprof_log_pb2.pyc'>

 

Artificial Intelligence courses in US Universities – Your 2017 resolution

ai-intro.jpeg

Source: http://ai.berkeley.edu/more_courses_other_schools.html

Tensorflow with CUDA/cuDNN on Ubuntu 16.04

tf-cuda-cudnn

Environment:

  • OS: Ubuntu 16.0
  • Python 2.7
  • CUDA 8.0.27
  • CuDNN v5.1
  • Note: TensorFlow with GPU support, both NVIDIA’s Cuda Toolkit (>= 7.0) and cuDNN (>= v3) need to be installed.

GPU verification:

$ nvidia-smi
Tue Nov 22 04:28:59 2016
+-------------------------------------------------------------+
| NVIDIA-SMI 370.28 Driver Version: 370.28 |
|---------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|====================+======================+======================|
| 0 GRID K520 Off | 0000:00:03.0 Off | N/A |
| N/A 43C P0 1W / 125W | 0MiB / 4036MiB | 0% Default |
+----------------------+----------------------+---------------+

+-------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===============================================================|
| No running processes found |
+--------------------------------------------------------------+

CUDA Toolkit verification:

$cat /usr/local/cuda/version.txt
CUDA Version 8.0.27

CuDNN Verification:

Download cudnn-8.0-linux-x64-v5.1.tgz from Nvidia developer site.

  • $ tar -xvzf cudnn-8.0-linux-x64-v5.1.tgz
  •  ## NOTE: unzip happens at local cuda folder
  • cuda
    • include/
      • cudnn.h
    • lib64/
      • libcudnn.so -> libcudnn.so.5*
      • libcudnn.so.5 -> libcudnn.so.5.1.5*
      • libcudnn.so.5.1.5*

You just need to merge cuDNN cudnn.h and lib64 files to cuda toolkit at /usr/bin/cuda as below:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Setting cuda libraries into path:

export PATH=${PATH}:/usr/local/cuda/bin

Tensorflow Install:

$ export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl
$ sudo pip install –upgrade $TF_BINARY_URL

Tensorflow Verification:

>>> import tensorflow as tf

I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
>>>

Have fun !!

Installing Nvidia driver and toolkit in Ubuntu 16.04 with tooklit samples

Pre-requisite:

  • Make sure you gcc, g++ in your machine
  • CUDA based graphics card

Installation:

Get Cuda 8.0 from Nvidia:

 

Unzip:

  • Unzip cudnn-8.0-linux-x64-v5.1.tgz
  • You will get the file as cuda_8.0.44_linux-run

Execution:

Run $ bash ./cuda_8.0.44_linux-run


Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: y
Do you want to install the OpenGL libraries?
(y)es/(n)o/(q)uit [ default is yes ]: y
Do you want to run nvidia-xconfig?
This will update the system X configuration file so that the NVIDIA X driver
is used. The pre-existing X configuration file will be backed up.
This option should not be used on systems that require a custom
X configuration, such as systems with multiple GPU vendors.
(y)es/(n)o/(q)uit [ default is no ]: y
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]: /usr/local/cuda
Cannot install toolkit in /usr/local/cuda.
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ubuntu ]: /mnt/avkash/cuda-samples
Installing the NVIDIA display driver…
Error: unsupported compiler: 5.4.1. Use –override to override this check.
Installing the CUDA Samples in /mnt/avkash/cuda-samples …
sh: 1: /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh: not found
chown: failed to get attributes of ‘/mnt/avkash/cuda-samples’: No such file or directory
===========
= Summary =
===========
Driver: Installed
Toolkit: Installation Failed. Using unsupported Compiler.
Samples: Installed in /mnt/avkash/cuda-samples
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /tmp/cuda_install_6246.log

 

Problem:

  • The problem is that due to compiler issue the installation did not work.
  • Looking the log you will see the gcc/g++ compiler are 5.4.x however over 5.3 are not supported.

Solution is to downgrade gcc/g++ to 4.9

Solution: https://aichamp.wordpress.com/2016/11/10/downgrading-gcc-from-5-4-to-4-9-in-ubuntu-16-04/

Try again:

Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/ubuntu ]: /mnt/avkash/cuda-samples
Installing the CUDA Toolkit in /usr/local/cuda-8.0 …
Installing the CUDA Samples in /mnt/avkash/cuda-samples …
Copying samples to /mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples now…
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /mnt/avkash/cuda-samples
Please make sure that
– PATH includes /usr/local/cuda-8.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver

 

Cuda Samples:

Compilation:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ make

Listing:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ll
    total 640
    drwxr-xr-x 2 root root 4096 Nov 21 01:24 ./
    drwxr-xr-x 7 root root 4096 Nov 21 01:17 ../
    -rwxr-xr-x 1 root root 581960 Nov 21 01:24 deviceQuery*
    -rw-r–r– 1 root root 12174 Nov 21 01:17 deviceQuery.cpp
    -rw-r–r– 1 root root 21264 Nov 21 01:24 deviceQuery.o
    -rw-r–r– 1 root root 9077 Nov 21 01:17 Makefile
    -rw-r–r– 1 root root 1737 Nov 21 01:17 NsightEclipse.xml
    -rw-r–r– 1 root root 168 Nov 21 01:17 readme.txt

Execution:

  • $/mnt/avkash/cuda-samples/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
    ./deviceQuery Starting…
    CUDA Device Query (Runtime API) version (CUDART static linking)
    Detected 1 CUDA Capable device(s)
    Device 0: “GRID K520”
    CUDA Driver Version / Runtime Version 8.0 / 8.0
    CUDA Capability Major/Minor version number: 3.0
    Total amount of global memory: 4036 MBytes (4232052736 bytes)
    ( 8) Multiprocessors, (192) CUDA Cores/MP: 1536 CUDA Cores
    GPU Max Clock rate: 797 MHz (0.80 GHz)
    Memory Clock rate: 2500 Mhz
    Memory Bus Width: 256-bit
    L2 Cache Size: 524288 bytes
    Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
    Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
    Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
    Total amount of constant memory: 65536 bytes
    Total amount of shared memory per block: 49152 bytes
    Total number of registers available per block: 65536
    Warp size: 32
    Maximum number of threads per multiprocessor: 2048
    Maximum number of threads per block: 1024
    Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
    Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
    Maximum memory pitch: 2147483647 bytes
    Texture alignment: 512 bytes
    Concurrent copy and kernel execution: Yes with 2 copy engine(s)
    Run time limit on kernels: No
    Integrated GPU sharing Host Memory: No
    Support host page-locked memory mapping: Yes
    Alignment requirement for Surfaces: Yes
    Device has ECC support: Disabled
    Device supports Unified Addressing (UVA): Yes
    Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 3
    Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
    deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GRID K520
    Result = PASS

 

Training SegNet model for multi-class pixel wise classification

 

Source: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html

This implementation of SegNet [1] is built on top of the Caffe deep learning library. The first step is to download the SegNet source code, which can be found on our GitHub repository here. Our code to support SegNet is licensed for non-commercial use (license summary). To install SegNet, please follow the Caffe installation instructions here. Make sure you also compile Caffe’s python wrapper.

screen-shot-2016-11-16-at-10-08-22-am

Get  SegNet Tutorial –  https://github.com/alexgkendall/SegNet-Tutorial

Your file structure should now look like this:

/SegNet/
    CamVid/
        test/
        testannot/
        train/
        trainannot/
        test.txt
        train.txt
    Models/
        # SegNet and SegNet-Basic model files for training and testing
    Scripts/
        compute_bn_statistics.py
        test_segmentation_camvid.py
    caffe-segnet/
        # caffe implementation

Testing SegNet Live: http://mi.eng.cam.ac.uk/projects/segnet/demo.php#demo

Bayesian SegNet:

This is a tutorial on Bayesian SegNet [4], a probabilistic extension to SegNet. By the end of this tutorial you will be able to train a model which can take an image like the one on the left, and produce a segmentation (center) and a measure of model uncertainty (right).

Source: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html#bayes_segnet

screen-shot-2016-11-16-at-10-34-52-am

 

Happy machine learning, have fun!!

 

Visualize Convolutional neural network (CNN) in real time

If you interested into visualizing CNN (Convolutional neural network), quiver open source application let you visualize convnet features in real time and interactively when used Keras to build the network.

The example is as below:

 

 

Visit Source Code: https://github.com/jakebian/quiver

Quickstart

Installation

    pip install quiver_engine

Usage

Take your keras model, launching Quiver is a one-liner.

    from quiver_engine import server
    server.launch(model)

This will launch the visualization at localhost:5000

 

Options

    server.launch(
        model, # a Keras Model

        # where to store temporary files generatedby quiver (e.g. image files of layers)
        temp_folder='./tmp',

        # a folder where input images are stored
        input_folder='./',

        # the localhost port the dashboard is to be served on
        port=5000
    )

Deep Learning Models in Keras:

https://github.com/fchollet/deep-learning-models

Visit Keras to learn more:

Styling image based on image using Artistic Neural Algorithm in Tensorflow

Have you thought modifying an image based on other image features? I tried the same to understand how CNN works in terms of understanding the image.

Neural Algorithm of the Artistic Style (Gatys et al.http://arxiv.org/abs/1508.06576)

Here is the final/result Image:

avkash-samurai

Source Image Style Image
avkash512 samurai

 

Coding Source: https://github.com/janivanecky/Artistic-Style

Above is a very simple implementation of Neural Algorithm of the Artistic Style (Gatys et al.http://arxiv.org/abs/1508.06576) in Tensorflow. I used VGG implementation from Chris and modified it slightly, stripping away unnecessary layers.

In the examples below I used content image as an initialization, it seems to provide more consistent image, but in the code, you can switch easily to noise initialization on line 109 in style.py. I used Adam for optimizer and let it run for 500 iterations.

To run it, you’re going to need:

Usage

python style.py content_image_path style_image_path [output_image] 
[{top,center,bottom,left,right}] [content_scale] [style_weight]

Pre-requsite:

Running Command:

$ python style.py avkash.jpg samurai.jpg result.jpg center 1 1

Iteration 0: 101205584.0
Iteration 5: 28933696.0
Iteration 10: 12709578.0
Iteration 15: 6670791.0
……….
……….
Iteration 480: 104311.460938
Iteration 485: 97180.828125
Iteration 490: 91784.2890625
Iteration 495: 88249.4609375

Configuration:

  • Total 500 iteration – Line # 144 in style.py
  • It is using tensorflow adam oprimizer – Line #137 style.py – tf.train.AdamOptimizer
  • This network has 4 CNN as defined in vgg.py

Helpful Tips

  • Make sure vgg.py & vgg19.npy is in the same folder along with style.py
  • Try to chage the optimizer and see the results:

Apendix:

ubuntu@XXXXX:~$ pip show Pillow
Name: Pillow
Version: 3.4.2
Summary: Python Imaging Library (Fork)
Home-page: http://python-pillow.org
Author: Alex Clark (Fork Author)
Author-email: aclark@aclark.net
License: Standard PIL License
Location: /usr/local/lib/python2.7/dist-packages
Requires:
ubuntu@XXXXX:~$ pip show tensorflow
Name: tensorflow
Version: 0.11.0rc2
Summary: TensorFlow helps the tensors flow
Home-page: http://tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /home/ubuntu/.local/lib/python2.7/site-packages
Requires: mock, protobuf, numpy, wheel, six