Let’s try H2o and DoParallel in to run GBM

What is H2O?

  • H2O is distributed machine learning platform for enterprises
  • It’ open source, support various supervise, un-supervise and clustering algorithms in distributed mode with interface from R, Python, Java, Scala & REST
  • Learn more here

What is DoParallel?

  • This R package let you run multiple R instances per CPU in parallel
  • It provides a parallel backend for the %dopar% function using the parallel package
  • Learn more here

Where is the code (.R):

# Loading h2o library
library(“h2o”)
#Loading doParallel Library
library(“doParallel”)
# Setting a cluster configuration of 8 CPU
cl <- makeCluster(8)
# Register parallel configuration for R environment
registerDoParallel(cl)
# Loading iris dataset as H2O data frame
iris.hex <- as.h2o(iris)
# Setting time
ptm <- proc.time()
# Looping each cluster environment to run R per cluster
R <- foreach(n = 3:10) %dopar% {
  #Note:  Reloading h2o library is must on each separate R instance. 
  # Loading h2o library on each R instance. 
  library(h2o)
 # Initializing H2O on each instance 
  h2o.init()
 # Calling GM model on h2o frame
  grid <- h2o.grid(“gbm”,  paste0(“gbm_grid_id”,n), x = c(1:4), y = 5,
                   training_frame = iris.hex,
                   hyper_params = list(ntrees = 1:n),
                   nfolds = 5)
  model <- h2o.getModel(grid@model_ids[[1]])
  # printing logloss from model
  as.numeric(grid@summary_table$logloss[1])
}
# listing h2o objects
h2o.ls
That’s all.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s