Building H2O GLM model using Postgresql database and JDBC driver

Note: Before we jump down, make sure you have postgresql is up and running and database is ready to respond your queries. Check you queries return results as records and are not null.

Download JDBC Driver 42.0.0 JDBC 4:

Note: I have tested H2O 3.10.4.2 with above JDBC driver 4.0 (Build 42.0.0) and Postgresql 9.2.x

In the following test I am connection to DVD Rental DB which is available into Postgresql. Need help to get it working.. visit Here and Here.

Test R (RStudio) for the postgresql connection working:

# Install package if you don't have it
> install.packages("RPostgreSQL")

# User package RPostgreSQL 
> library(RPostgreSQL)

# Code to test database and table:
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, dbname = "dvdrentaldb", host = "localhost", port = 5432,
> user = "avkash", password = "avkash")
> dbExistsTable(con, "actor")
TRUE

Start H2O with JDBC driver:

$ java -cp postgresql-42.0.0.jre6.jar:h2o.jar water.H2OApp

Note:

  • You must have h2o.jar and postgresql-42.0.0.jre6.jar in the same folder as above.
  • You must start h2o first and then connect to running instance of H2O from R as below.
  • I am connecting to a table name payment below
  • I am using table payment to run H2O GLM model

Connecting H2O from R:

> library(h2o)
> h2o.init()
> h2o.init(strict_version_check = FALSE)
> payment = h2o.import_sql_table(connection_url = “jdbc:postgresql://localhost:5432/h2odb?&useSSL=false”, table= “payment”, username = “avkash”, password = “avkash”)
> aa = names(payment)[-5]
> payment_glm = h2o.glm(x = aa, y = “amount”, training_frame = payment)
> payment_glm

Here is the full code snippet in working:

 

payment = h2o.import_sql_table(connection_url = “jdbc:postgresql://localhost:5432/h2odb?&useSSL=false”, table= “payment”, username = “avkash”, password = “avkash”)
|=============================================| 100%
> payment
payment_id customer_id staff_id rental_id amount payment_date
1 17503 341 2 1520 7.99 1.171607e+12
2 17504 341 1 1778 1.99 1.171675e+12
3 17505 341 1 1849 7.99 1.171695e+12
4 17506 341 2 2829 2.99 1.171943e+12
5 17507 341 2 3130 7.99 1.172022e+12
6 17508 341 1 3382 5.99 1.172090e+12

[14596 rows x 6 columns]
> aa = names(payment)[-5]
> payment_glm = h2o.glm(x = aa, y = “amount”, training_frame = payment)
|=============================================| 100%
> payment_glm
Model Details:
==============

H2ORegressionModel: glm
Model ID: GLM_model_R_1490053774745_2
GLM Model: summary
family link regularization number_of_predictors_total number_of_active_predictors
1 gaussian identity Elastic Net (alpha = 0.5, lambda = 1.038E-4 ) 5 5
number_of_iterations training_frame
1 0 payment_sql_to_hex

Coefficients: glm coefficients
names coefficients standardized_coefficients
1 Intercept -10.739680 4.200606
2 payment_id -0.000009 -0.038040
3 customer_id 0.000139 0.024262
4 staff_id 0.103740 0.051872
5 rental_id 0.000001 0.003172
6 payment_date 0.000000 0.026343

H2ORegressionMetrics: glm
** Reported on training data. **

MSE: 5.607411
RMSE: 2.367997
MAE: 1.950123
RMSLE: 0.5182649
Mean Residual Deviance : 5.607411
R^2 : 0.0007319098
Null Deviance :81905.72
Null D.o.F. :14595
Residual Deviance :81845.77
Residual D.o.F. :14590
AIC :66600.46

 

Thats all, enjoy!!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s