H2O backend and API processing through Rapids

H2O cluster support various frontend i.e. python, R, FLOW etc and all the functions at these various front ends are handled through H2O cluster backend through API. Frontend actions are translated into API and H2O backend handles these API through Rapid expressions. We will understand how these APIs are handled from backend.

Lets Start H2O from command line directly from h2o.jar

$ java -jar h2o.jar

Now use python to connect with H2O

> import h2o

> h2o.init()

> h2o.ls()

Note: You will see there are no keys as the result of h2o.ls()

> df = h2o.create_frame(cols=2, rows=5,integer_range=1,time_fraction=1)

> h2o.ls()

Note: Now you will see a new key shown as below:


0     py_32_sid_9613

Note: Above py_32_sid_9613 is the frame ID in H2O memory for the frame we just created using create_frame API.

> df

2013-09-26 19:47:37   1995-01-01 16:14:34

1983-12-04 04:05:07    1974-09-08 23:06:41

2015-03-03 01:56:36    1982-11-03 19:21:53

1979-10-20 08:35:22     1987-10-09 14:24:59

1990-09-26 11:56:17     1981-08-16 04:23:02

> df.sort([‘C1′,’C2’])

C1                                    C2

1979-10-20 08:35:22     1987-10-09 14:24:59

1983-12-04 04:05:07     1974-09-08 23:06:41

1990-09-26 11:56:17     1981-08-16 04:23:02

2013-09-26 19:47:37      1995-01-01 16:14:34

2015-03-03 01:56:36      1982-11-03 19:21:53

> h2o.ls()


0     py_32_sid_9613

1     py_34_sid_9613

Note: As we ran the sort operation on the given frame df, another temporary frame py_34_sid_9613 was created. If you have created a new data frame to store sorted records as below a new frame would have been created as well to store the results of frame ndf as below:

> ndf = df.sort([‘C1′,’C2’])

Now if you look at the H2O logs you will see how the Rapids are

09-08 11:10:33.204 20753 #02927-14 INFO: 
    POST /99/Rapids, parms: {ast=(tmp= py_34_sid_9613 
        (sort py_32_sid_9613 ['C1' 'C2'])), session_id=_sid_9613}

Looking into the above logs we can understand the following:

Function sort was applied on frame  py_32_sid_9613 with parameters as columns [‘C1′,’C2’] and the result of this operation is frame  py_34_sid_9613.

This is how you can decipher H2O Rapids for any H2O API you tried.

That’s all, enjoy!!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s