Ranking GBM tree based on scoring metrics

Here is the full python code:

import h2o
import pandas as pd
h2o.init()

## Import data
df = h2o.import_file('/Users/avkashchauhan/airlines_train.csv')
df.shape
df.col_names
y = "IsDepDelayed"
x = df.col_names
x.remove(y)
print(x)

## Building GBM model
from h2o.estimators.gbm import H2OGradientBoostingEstimator
gbm_model = H2OGradientBoostingEstimator()
gbm_model.train(x = x, y = y, training_frame=df)

## Understanding model
print(gbm_model)
print("Total trees in the model : " + str(gbm_model.default_params['ntrees']))
scoring_hist = gbm_model.scoring_history()
print(scoring_hist.shape)

## Looking scoring history
scoring_hist

## logloss metric in scoring history:
scoring_hist['training_logloss']
### Difference  in logloss metric from scoring for each tree
diff_df = scoring_hist['training_logloss'].diff()
### Ranking Each Tree
diff_df.rank()

## AUC metric in scoring history:
scoring_hist['training_auc']
### Difference in logloss metric from scoring for each tree
diff_df = scoring_hist['training_auc'].diff()
### Ranking Each Tree
diff_df.rank()

Here is the link to ipython notebook with example:

https://github.com/Avkash/mldl/blob/master/notebook/h2o/GBM_Tree_Ranking_based_on_metrics.ipynb

That’s it, enjoy!!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s