Prediction with unknown categorical in H2O

Handling the unknown categorical levels in MOJO and POJO during prediction:

Problem: I have a model that I have exported to Mojo and am pushing data through it.

I understand that there are effectively two options for dealing with unknown categorical variables. By default, and unknown category will throw a PredictUnknownCategoricalLevelException. Alternatively we can select setConvertUnknownCategoricalLevelsToNa(true) and the unknown level will be set to Double.NaN.

With first option we also have the option of getting the count of unknown levels per column – but no information on what those levels actually are. This information is needed for debugging – is there a way to get it?

I was thinking I could do an initial pass on my data to test for unknown levels, but I’m not sure whether it’s possible to interrogate the model for a list of known levels. In order to properly understand / diagnose the arrival of new levels, it would be helpful to interrogate the model for a list of levels (per column index)

Another issue is related with type. If data are read in from a text file for the purpose of predictions, the type is a string – however the model may expect a different type. Is there a way to interrogate the type of each column from the model so that casting can be done correctly before passing the data into the RowData object ?

Solution:

  1. If you catch PredictUnknownCategoricalLevelException it has a field, unknownLevel, which lets you handle the unknown level any way you like. You can repair the row, report the unknownLevel, and try again.
  2. The model has a getDomains() function. This returns the level names for each column. In the MOJO world, “domain” == “level”. So you can interrogate the model
  3. If the data is read in from a text file as a string, the Easy wrapper is smart enough to parse it as a double first.

Deep Learning with H2O in Scala

What is H2O? Click here

How to perform Deep Learning in H2O? Click here

Here is the code snippet to show how to write deep learning code in Scala using H2O:

import _root_.hex.deeplearning.DeepLearning
import _root_.hex.deeplearning.DeepLearningModel.DeepLearningParameters
val dlParams = new DeepLearningParameters()
dlParams._response_column = "RESPONSE"
dlParams._keep_cross_validation_predictions = true
dlParams._train = trainFrame
dlParams._valid = validFrame
dlParams._nfolds = 5
dlParams._hidden = Array(2,2)
val my_dl = new DeepLearning(dlParams, Key.make("my_dl_model.hex"))
val my_dl_model = my_dl.trainModel.get
println(my_dl_model)

To learn more about deep learning parameters list please visit:

http://docs.h2o.ai/h2o/latest-stable/h2o-algos/javadoc/hex/deeplearning/DeepLearningModel.DeepLearningParameters.html

To learn more about fine tuning deep learning parameters please visit:

http://blog.h2o.ai/2015/08/deep-learning-performance-august/

Keywords: Scala, Deep Learning, H2O