In this full Scala sample we will be using H2O Stacked Ensembles algorithm. Stacked ensemble is a process of building models of various types first with cross-validation and keep fold columns for each model. In the next step building the stacked ensemble model using all the CV folds. You can learn more about Stacked Ensembles here.
In this Stacked Ensemble we will be using GBM and Deep Learning Algorithms and then finally building the Stacked Ensemble model using the GBM and Deep Learning models.
First lets import key classes specific to H2O:
import org.apache.spark.h2o._ import water.Key import java.io.File
Now we will create H2O context so we can call key H2O function specific to data ingest and Deep Learning algorithms:
val h2oContext = H2OContext.getOrCreate(sc) import h2oContext._ import h2oContext.implicits._
Lets import data from local file system as H2O Data Frame:
val prostateData = new H2OFrame(new File("/Users/avkashchauhan/src/github.com/h2oai/sparkling-water/examples/smalldata/prostate.csv"))
In this Stacked Ensemble we will be using GBM and Deep Learning Algorithms so lets first build the deep learning model:
import _root_.hex.deeplearning.DeepLearning import _root_.hex.deeplearning.DeepLearningModel.DeepLearningParameters val dlParams = new DeepLearningParameters() dlParams._epochs = 100 dlParams._train = prostateData dlParams._response_column = 'CAPSULE dlParams._variable_importances = true dlParams._nfolds = 5 dlParams._seed = 1111 dlParams._keep_cross_validation_predictions = true; val dl = new DeepLearning(dlParams, Key.make("dlProstateModel.hex")) val dlModel = dl.trainModel.get
Now lets build the GBM model:
import _root_.hex.tree.gbm.GBM import _root_.hex.tree.gbm.GBMModel.GBMParameters val gbmParams = new GBMParameters() gbmParams._train = prostateData gbmParams._response_column = 'CAPSULE gbmParams._nfolds = 5 gbmParams._seed = 1111 gbmParams._keep_cross_validation_predictions = true; val gbm = new GBM(gbmParams,Key.make("gbmRegModel.hex")) val gbmModel = gbm.trainModel().get()
Now build the Stacked Ensemble Models so first we need classes required for Stacked Ensembles as below:
import _root_.hex.Model import _root_.hex.StackedEnsembleModel import _root_.hex.ensemble.StackedEnsemble
Now we will define Stacked Ensembles parameters as below:
val stackedEnsembleParameters = new StackedEnsembleModel.StackedEnsembleParameters() stackedEnsembleParameters._train = prostateData._key stackedEnsembleParameters._response_column = 'CAPSULE
Now we need to pass all the different algorithms we would want to use in the Stacked Ensemble by passing their keys as below:
type T_MODEL_KEY = Key[Model[_, _ <: Model.Parameters, _ <:Model.Output]] // Option 1 stackedEnsembleParameters._base_models = Array(gbmRegModel._key.asInstanceOf[T_MODEL_KEY], dlModel._key.asInstanceOf[T_MODEL_KEY]) // Option 2 stackedEnsembleParameters._base_models = Array(gbmRegModel, dlModel).map(model => model._key.asInstanceOf[T_MODEL_KEY]) // Note: You can choose any of the above option to pass the model keys
Finally defining the stacked ensemble job as below:
val stackedEnsembleJob = new StackedEnsemble(stackedEnsembleParameters)
And as the last steps let build the stacked ensemble model:
val stackedEnsembleModel = stackedEnsembleJob.trainModel().get();
Now we can take a look at our Stacked Ensemble model as below:
stackedEnsembleModel
Thats it, enjoy!!
Helpful content: https://github.com/h2oai/h2o-3/blob/a554bffabda6770386a31d47e05f00543d7b9ac3/h2o-algos/src/test/java/hex/ensemble/StackedEnsembleTest.java