Getting all categorical for predictors in H2O POJO and MOJO models

Here is the Java/Scala code snippet which shows how you can get the categorical values for each enum/factor predictor from H2O POJO and MOJO Models:

to get the list of all column names in your POJO/MOJO model, you can try the following:

Imports:

import java.io.*;
import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;
import hex.genmodel.MojoModel;
import java.util.Arrays;

POJO:

## First use the POJO model class as below:
private static String modelClassName = "gbm_prostate_binomial";

##Then you can GenModel class to get info you are looking for as below:
hex.genmodel.GenModel rawModel;
rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();

## Now you can get the results as below:
System.out.println("isSupervised : " + rawModel.isSupervised());
System.out.println("Columns Names :  " + Arrays.toString(rawModel.getNames()));
System.out.println("Response ID : " + rawModel.getResponseIdx());
System.out.println("Number of columns : " + rawModel.getNumCols());
System.out.println("Response Name : " + rawModel.getResponseName());

## Printing all categorical values for each predictors
for (int i = 0; i < rawModel.getNumCols(); i++) 
{
 String[] domainValues = rawModel.getDomainValues(i);
 System.out.println(Arrays.toString(domainValues));
}
Output Results:
isSupervised : true
Column Names : [ID, AGE, RACE, DPROS, DCAPS, PSA, VOL, GLEASON]
Response ID : 8
Number of columns : 8
null
null
[0, 1, 2]
null
null
null
null
null
Note: For all null values means the predictor was numeric values and all the categorical values are listed for the each enum/factor predictor.

MOJO:

## Lets assume you have MOJO model as gbm_prostate_binomial.zip
## You would need to load your model as below:
hex.genmodel.GenModel mojo = MojoModel.load("gbm_prostate_binomial.zip");

## Now you can get list of predictors as below:
System.out.println("isSupervised : " + mojo.isSupervised());
System.out.println("Columns Names : " + Arrays.toString(mojo.getNames()));
System.out.println("Number of columns : " + mojo.getNumCols());
System.out.println("Response ID : " + mojo.getResponseIdx());
System.out.println("Response Name : " + mojo.getResponseName());

## Printing all categorical values for each predictors
for (int i = 0; i < mojo.getNumCols(); i++) {
 String[] domainValues = mojo.getDomainValues(i);
 System.out.println(Arrays.toString(domainValues));
 }
Output Results:
isSupervised : true
Column Names : [ID, AGE, RACE, DPROS, DCAPS, PSA, VOL, GLEASON]
Response ID : 8
Number of columns : 8
null
null
[0, 1, 2]
null
null
null
null
null
Note: For all null values means the predictor was numeric values and all the categorical values are listed for the each enum/factor predictor.

To can get help on using MOJO and POJO models visit the following:

That’s it, enjoy!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s