Working with variable importance data with models in H2O

When building classification models in H2O, you will get to see the variable importance table at the FLOW UI. It looks like as below:

Screen Shot 2017-04-11 at 3.18.54 PM

Most of the users are using python or R as their shell so there could be a need to get this variable importance table into python or R shell. This is what we will do in next step.

If we want to plot the variable importance graph we can use the following script:

import matplotlib.pyplot as plt
plt.rcdefaults()
fig, ax = plt.subplots()
variables = mymodel._model_json['output']['variable_importances']['variable']
y_pos = np.arange(len(variables))
scaled_importance = mymodel._model_json['output']['variable_importances']['scaled_importance']
ax.barh(y_pos, scaled_importance, align='center', color='green', ecolor='black')
ax.set_yticks(y_pos)
ax.set_yticklabels(variables)
ax.invert_yaxis()
ax.set_xlabel('Scaled Importance')
ax.set_title('Variable Importance')
plt.show()

Here is the variable importance graph looks like:

Screen Shot 2017-04-11 at 3.09.22 PM

If we want to see the variable metrics directly from the model in python we can do the following:

mymodel._model_json['output']['variable_importances'].as_data_frame()

The results are shown as below:

Screen Shot 2017-04-11 at 3.13.30 PM

Thats it, enjoy!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s