Building Regression and Classification GBM models in Scala with H2O

In the full code below you will learn to build H2O GBM model (Regression and binomial classification) in Scala.

Lets first import all the classes we need for this project:

import org.apache.spark.SparkFiles
import org.apache.spark.h2o._
import org.apache.spark.examples.h2o._
import org.apache.spark.sql.{DataFrame, SQLContext}
import water.Key
import java.io.File

import water.support.SparkContextSupport.addFiles
import water.support.H2OFrameSupport._

// Create SQL support
implicit val sqlContext = spark.sqlContext
import sqlContext.implicits._

Next we need to start H2O cluster so we can start using H2O APIs:

// Start H2O services
val h2oContext = H2OContext.getOrCreate(sc)
import h2oContext._
import h2oContext.implicits._

Now we need to ingest the data which we can use to perform modeling:

// Import prostate data into H2O
val prostateData = new H2OFrame(new File("/Users/avkashchauhan/src/github.com/h2oai/sparkling-water/examples/smalldata/prostate.csv"))

// Understanding our input data
prostateData.names
prostateData.numCols
prostateData.numRows
prostateData.keys
prostateData.key

Now we will import some H2O specific classes we need to perform our actions:

import h2oContext.implicits._
import _root_.hex.tree.gbm.GBM
import _root_.hex.tree.gbm.GBMModel.GBMParameters

Lets setup GBM Parameters which will shape our GBM modeling process:

val gbmParams = new GBMParameters()
gbmParams._train = prostateData
gbmParams._response_column = 'CAPSULE

In above response column setting the column “CAPSULE” is numeric so by default the GBML model will build a regression model. Lets start building GBM Model now:

val gbm = new GBM(gbmParams,Key.make("gbmRegModel.hex"))
val gbmRegModel = gbm.trainModel.get
// Same as above
val gbmRegModel = gbm.trainModel().get()

Lets get to know our GBM Model and we will see that the type of this model is “regression”:

gbmRegModel

Lets perform prediction using GBM Regression Model:

val predH2OFrame = gbmRegModel.score(prostateData)('predict)
val predFromModel = asRDD[DoubleHolder](predH2OFrame).collect.map(_.result.getOrElse(Double.NaN))

Now we will set the input data set to perform GBM classification model. Below we are setting the response column to be a categorical type so all the values in this column becomes enumerator instead of number, this way we can make sure that the GBM model we will build will be a classification model:

prostateData.names()
//
// >>> res6: Array[String] = Array(ID, CAPSULE, AGE, RACE, DPROS, DCAPS, PSA, VOL, GLEASON)
// Based on above the CAPSULE is the id = 1
// Note: If we will not set categorical for response variable we will see the following exception
//        - water.exceptions.H2OModelBuilderIllegalArgumentException: 
//             - Illegal argument(s) for GBM model: gbmModel.hex.  Details: ERRR on field: _distribution: Binomial requires the response to be a 2-class categorical

withLockAndUpdate(prostateData){ fr => fr.replace(1, fr.vec("CAPSULE").toCategoricalVec)}

gbmParams._response_column = 'CAPSULE

We can also set the distribution to have a specific method. In the code below we are setting distribution to have Bernoulli method:

import _root_.hex.genmodel.utils.DistributionFamily
gbmParams._distribution = DistributionFamily.bernoulli

Now lets build our GBM  model now:

val gbm = new GBM(gbmParams,Key.make("gbmBinModel.hex"))
val gbmBinModel = gbm.trainModel.get
// Same as above
val gbmBinModel = gbm.trainModel().get()

Lets check the new model and we will find that it is a classification model and specially binomial classification because it has only 2 classes in its response classes :

gbmBinModel

Now lets perform the prediction using our GBM Binomial Classification Model as below:

val predH2OFrame = gbmBinModel.score(prostateData)('predict)
val predFromModel = asRDD[DoubleHolder](predH2OFrame).collect.map(_.result.getOrElse(Double.NaN))

Thats all, enjoy!!

 

 

 

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s