Using Cross-validation in Scala with H2O and getting each cross-validated model

Here is Scala code for binomial classification with GLM:

https://aichamp.wordpress.com/2017/04/23/binomial-classification-example-in-scala-and-gbm-with-h2o/

To add cross validation you can do the following:

def buildGLMModel(train: Frame, valid: Frame, response: String)
 (implicit h2oContext: H2OContext): GLMModel = {
 import _root_.hex.glm.GLMModel.GLMParameters.Family
 import _root_.hex.glm.GLM
 import _root_.hex.glm.GLMModel.GLMParameters
 val glmParams = new GLMParameters(Family.binomial)
 glmParams._train = train
 glmParams._valid = valid
 glmParams._nfolds = 3  ###### Here is cross-validation ###
 glmParams._response_column = response
 glmParams._alpha = Array[Double](0.5)
 val glm = new GLM(glmParams, Key.make("glmModel.hex"))
 glm.trainModel().get()
}

To look cross-validated model try this:

scala> glmModel._output._cross_validation_models
res12: Array[water.Key[_ <: water.Keyed[_ <: AnyRef]]] = 
    Array(glmModel.hex_cv_1, glmModel.hex_cv_2, glmModel.hex_cv_3)

Now to get each model do the following:

scala> val m1 = DKV.getGet("glmModel.hex_cv_1").asInstanceOf[GLMModel]

And you will see the following:

scala> val m1 = DKV.getGet("glmModel.hex_cv_1").asInstanceOf[GLMModel]
m1: hex.glm.GLMModel =
Model Metrics Type: BinomialGLM
 Description: N/A
 model id: glmModel.hex_cv_1
 frame id: glmModel.hex_cv_1_train
 MSE: 0.14714406
 RMSE: 0.38359362
 AUC: 0.7167627
 logloss: 0.4703465
 mean_per_class_error: 0.31526923
 default threshold: 0.27434438467025757
 CM: Confusion Matrix (vertical: actual; across: predicted):
 0 1 Error Rate
 0 10704 1651 0.1336 1,651 / 12,355
 1 1768 1790 0.4969 1,768 / 3,558
Totals 12472 3441 0.2149 3,419 / 15,913
Gains/Lift Table (Avg response rate: 22.36 %):
 Group Cumulative Data Fraction Lower Threshold Lift Cumulative Lift Response Rate Cumulative Response Rate Capture Rate Cumulative Capture Rate Gain Cumulative Gain
 1 0.01005467 0....
scala> val m2 = DKV.getGet("glmModel.hex_cv_2").asInstanceOf[GLMModel]
m2: hex.glm.GLMModel =
Model Metrics Type: BinomialGLM
 Description: N/A
 model id: glmModel.hex_cv_2
 frame id: glmModel.hex_cv_2_train
 MSE: 0.14598908
 RMSE: 0.38208517
 AUC: 0.7231473
 logloss: 0.46717605
 mean_per_class_error: 0.31456697
 default threshold: 0.29637953639030457
 CM: Confusion Matrix (vertical: actual; across: predicted):
 0 1 Error Rate
 0 11038 1395 0.1122 1,395 / 12,433
 1 1847 1726 0.5169 1,847 / 3,573
Totals 12885 3121 0.2025 3,242 / 16,006
Gains/Lift Table (Avg response rate: 22.32 %):
 Group Cumulative Data Fraction Lower Threshold Lift Cumulative Lift Response Rate Cumulative Response Rate Capture Rate Cumulative Capture Rate Gain Cumulative Gain
 1 0.01005873 0...
scala> val m3 = DKV.getGet("glmModel.hex_cv_3").asInstanceOf[GLMModel]
m3: hex.glm.GLMModel =
Model Metrics Type: BinomialGLM
 Description: N/A
 model id: glmModel.hex_cv_3
 frame id: glmModel.hex_cv_3_train
 MSE: 0.14626761
 RMSE: 0.38244948
 AUC: 0.7239823
 logloss: 0.46873763
 mean_per_class_error: 0.31437498
 default threshold: 0.28522220253944397
 CM: Confusion Matrix (vertical: actual; across: predicted):
 0 1 Error Rate
 0 10982 1490 0.1195 1,490 / 12,472
 1 1838 1771 0.5093 1,838 / 3,609
Totals 12820 3261 0.2070 3,328 / 16,081
Gains/Lift Table (Avg response rate: 22.44 %):
 Group Cumulative Data Fraction Lower Threshold Lift Cumulative Lift Response Rate Cumulative Response Rate Capture Rate Cumulative Capture Rate Gain Cumulative Gain
 1 0.01001182 0...
scala>

Thats it, enjoy!!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s