Creating Partial Dependency Plot (PDP) in H2O

Starting from H2O H2O added partial dependency plot which has the Java backend to do the mutli-scoring of the dataset with the model. This makes creating PDP much faster.

To get PDP in H2O you must need Model, and the original data set used to generate mode. Here are few ways to create PDP:

If you want to generate PDP on a single column:

response = h2o.predict(model, data.pdp[, column_name])
To generate PDP on the original data set:
response = h2o.predict(model, data.pdp)
If you want to build PDP directly from Model and dataset without using PDP API, you can the following code:
model = prostate.gbm
column_name = "AGE"
data.pdp = data.hex
bins = unique(h2o.quantile(data.hex[, column_name], probs = seq(0.05,1,0.05)) )
mean_responses = c()

for(bin in bins ){
  data.pdp[, column_name] = bin
  response = h2o.predict(model, data.pdp[, column_name])
  mean_response = mean(response[,ncol(response)])
  mean_responses = c(mean_responses, mean_response)

pdp_manual = data.frame(AGE = bins, mean_response = mean_responses)
plot(pdp_manual, type = "l")
Thats it, enjoy!!

2 thoughts on “Creating Partial Dependency Plot (PDP) in H2O

  1. Hi. Thank you for the code. I would like ask regarding an extension of this.
    How would the code change if the variable that we want to plot is a dummy ?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s