Installing or upgrading python3.6 in Ubuntu 16.04

Download python 3.6.1 and install as below:

wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz
tar xvf Python-3.6.1.tgz
cd Python-3.6.1
./configure --enable-optimizations
make -j8
# If you want to keep previous version user altinstall
sudo make altinstall
# if you want to replace previous version use install
# sudo make install

Testing python3.6

$ python3.6

Once it is working check its launching path:

$ which python3.6
/usr/local/bin/python3.6

Now you just need to change the links for python3 binary as below:

$ sudo ln -s /usr/local/bin/python3.6 /usr/local/python3

Now test python3 for the final:

$ python3
Python 3.6.1 (default, Jun 8 2017, 16:11:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

That’s it, enjoy!!

Installing ipython 5.0 (lower then 6.0) compatible with python 2.6/2.7

It is possible that you may need to install some python library or component with your python 2.6 or 2.7 environment. If those components need IPython then you

For example, with python 2.7.x when you try to install jupyter as below:

$ pip install jupyter --user

You will get the error as below:

Using cached ipython-6.0.0.tar.gz
 Complete output from command python setup.py egg_info:

IPython 6.0+ does not support Python 2.6, 2.7, 3.0, 3.1, or 3.2.
 When using Python 2.7, please install IPython 5.x LTS Long Term Support version.
 Beginning with IPython 6.0, Python 3.3 and above is required.

See IPython `README.rst` file for more information:

https://github.com/ipython/ipython/blob/master/README.rst

Python sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0) detected.

To solve this problem you just need to install IPython 5.x (instead of 6.0 which is pulled as default when installing jupyter or independently ipython.

Here is the way you can install IPython 5.x version:

$ pip install IPython==5.0 --user
$ pip install jupyter --user

Thats it, enjoy!!

Thats it, enjoy!!

 

 

Splitting h2o data frame based on date time value

Sometime we may need to split the data frame based on date time values i.e. one split is above certain date and another split is after certain date.

Here is an example of the python code on how to split it:

import datetime
timedata = h2o.import_file("/Users/avkashchauhan/Downloads/date-data.csv")
timedata.shape
date_before_data = timedata[timedata['date'] < datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_after_data = timedata[timedata['date'] >= datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_before_data.shape
date_after_data.shape

If you decide to split one piece of data frame and then add one of the split to previous data frame you can do the following:

part1, part2 = date_after_data.split_frame(ratios=[0.5])
final_data = date_before_data.rbind(part2)

Note the CSV file contents are as below:

id date
1 9/1/2015
2 9/2/2015
3 9/3/2015
4 9/4/2015
5 9/5/2015
6 9/6/2015
7 9/7/2015
8 9/8/2015
9 9/9/2015
10 9/10/2015
11 12/1/2015
12 12/2/2015
13 12/3/2015
14 12/4/2015
15 12/5/2015
16 12/6/2015
17 12/7/2015
18 12/8/2015
19 12/9/2015
20 12/10/2015

Thats it, enjoy!!

Union of two different H2O data frames in python and R

We have first data frame as below:

C1 C2 C3 C4
10 20 30 40
3 4 5 6
5 7 8 9
12 3 55 10

And then we have second data frame as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 33 44
3 4 5 6 11 22
5 7 8 9 90 100
12 3 55 10 33 44

If we just try to add these two data frame blindly as below:

final = df2.rbind(df1)

We will get the following error:

H2OValueError: Cannot row-bind a dataframe with 6 columns to a data frame with 4 columns: the columns must match

So we need to merge two data sets of different columns we need to instrument our datasets to meet the rbind need.  First we will add remaining columns from “df2” to “df1” as below:

df1['C10'] = 0
df1['C20'] = 0

The updated data frame looks like as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 0 0
3 4 5 6 0 0
5 7 8 9 0 0
12 3 55 10 0 0

Now we will do rbind with “df2” to “df1” as below:

df1 = df1.rbind(df2)

Now “df1” looks like as below:

C1 C2 C3 C4 C10 C20
10 20 30 40 0 0
3 4 5 6 0 0
5 7 8 9 0 0
12 3 55 10 0 0
10 20 30 40 33 44
3 4 5 6 11 22
5 7 8 9 90 100
12 3 55 10 33 44

If you are using R you just need to do the following to add new columns into your first data frame:

df1$C10 = 0
df1$C20 = 0

You must make sure the number of columns match before doing rbind and number of rows match before doing cbind.

Thats it, enjoy!!

Renaming data frame column names in H2O (python)

Sometimes you may need to change the all the column names or a specific column due to certain need, and you can do as below:

>>> df = h2o.import_file("/Users/avkashchauhan/src/github.com/h2oai/h2o-3/smalldata/iris/iris.csv")
Parse progress: |█████████████████████████████████████████████████████████████████████████████| 100%
>>> df
 C1 C2 C3 C4 C5
---- ---- ---- ---- -----------
 5.1 3.5 1.4 0.2 Iris-setosa
 4.9 3 1.4 0.2 Iris-setosa
 4.7 3.2 1.3 0.2 Iris-setosa
 4.6 3.1 1.5 0.2 Iris-setosa
 5 3.6 1.4 0.2 Iris-setosa
 5.4 3.9 1.7 0.4 Iris-setosa
 4.6 3.4 1.4 0.3 Iris-setosa
 5 3.4 1.5 0.2 Iris-setosa
 4.4 2.9 1.4 0.2 Iris-setosa
 4.9 3.1 1.5 0.1 Iris-setosa

[150 rows x 5 columns]

>>> df.names
[u'C1', u'C2', u'C3', u'C4', u'C5']

>>> df.set_names(['A1','A2','A3','A4','A5'])
 A1 A2 A3 A4 A5
---- ---- ---- ---- ------
 5.1 3.5 1.4 0.2 Iris_A
 4.9 3 1.4 0.2 Iris_A
 4.7 3.2 1.3 0.2 Iris_A
 4.6 3.1 1.5 0.2 Iris_A
 5 3.6 1.4 0.2 Iris_A
 5.4 3.9 1.7 0.4 Iris_A
 4.6 3.4 1.4 0.3 Iris_A
 5 3.4 1.5 0.2 Iris_A
 4.4 2.9 1.4 0.2 Iris_A
 4.9 3.1 1.5 0.1 Iris_A

[150 rows x 5 columns]

If you want to change only few column names then you still need to copy the original name in the same index and just add the changed name into where applicable. For example in above data frame, we just want to change A5 to Levels and we will do as below:

>>> df.set_names(['A1','A2','A3','A4','Levels'])
 A1 A2 A3 A4 Levels
---- ---- ---- ---- --------
 5.1 3.5 1.4 0.2 Iris_A
 4.9 3 1.4 0.2 Iris_A
 4.7 3.2 1.3 0.2 Iris_A
 4.6 3.1 1.5 0.2 Iris_A
 5 3.6 1.4 0.2 Iris_A
 5.4 3.9 1.7 0.4 Iris_A
 4.6 3.4 1.4 0.3 Iris_A
 5 3.4 1.5 0.2 Iris_A
 4.4 2.9 1.4 0.2 Iris_A
 4.9 3.1 1.5 0.1 Iris_A

[150 rows x 5 columns]

The set_names function must have all names values in the array, either same name or changes names otherwise it will generate an error.

For example the following will not work and will throw an error:

>>> df.set_names(['A1'])
>>> df.set_names(['A1','A2','A3','A4','A5','A6'])

Thats it, enjoy!!

Using Sparkling water and PySpark to log console output

Here is the command Option #1:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the command Option #2:

./pyspark --deploy-mode client --conf spark.dynamicAllocation.enabled=false --packages com.databricks:spark-csv_2.11:1.4.0,ai.h2o:sparkling-water-core_2.10:1.6.7 --py-files ../../sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

We must make sure that both h2o backend and python version are calling same Version of API.

This parameter is using H2O API backend version 1.6.7
ai.h2o:sparkling-water-core_2.10:1.6.7

This parameter is using 1.6.7 version of Python API:
–py-files /mnt/app/sparkling-water-1.6.7/py/dist/h2o_pysparkling_1.6-1.6.7-py2.7.egg

Here is the script to test overall scenario:

>>> from pysparkling import *
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> import h2o
>>> sqlContext = SQLContext(sc)
>>> hc = H2OContext.getOrCreate(sc)

Finding Python Module version

Sometimes it becomes a need to find out particular module version in Python, here is the quickest way to find it:

  1. Launch python
  2. >>> import <module_by_name>
  3. Try any of the following way to get the module version:
    1. >>> ModuleName.__version__
    2. >>> ModuleName.version
    3. >>> ModuleName.version_string

Here is an example of finding pyCrypto (Module name – Crypto) version:

╚═╝ § python
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import Crypto
>>> Crpto.__version__
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
NameError: name ‘Crpto’ is not defined
>>> Crypto.__version__
‘2.6’
>>> quit()