Using H2O AutoML for Kaggle Porto Seguro Safe Driver Prediction Competition

If you into competitive machine learning you must be visiting Kaggle routinely. Currently you can compete for cash and recognition at the Porto Seguro’s Safe Driver Prediction as well.

I did try to given training dataset (as it is) with H2O AutoML which ran for about 5 hours and I was able to get into top 280th position. If you could transform the dataset properly and run H2O AutoML you may be able to get even higher ranking.

Following is the simplest H2O AutoML python script which you can try as well (Note: Make sure to change the run_automl_for_seconds to the desired time you would want to run the experiment.)

import h2o
import pandas as pd
from h2o.automl import H2OAutoML

h2o.init()
train = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTrain.csv')
test = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTest.csv')
sub_data = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroSample_submission.csv')

y = 'target'
x = train.columns
x.remove(y)

## Time to run the experiment
run_automl_for_seconds = 18000
## Running AML for 4 Hours
aml = H2OAutoML(max_runtime_secs =run_automl_for_seconds)
train_final, valid = train.split_frame(ratios=[0.9])
aml.train(x=x, y =y, training_frame=train_final, validation_frame=valid)

leader_model = aml.leader
pred = leader_model.predict(test_data=test)

pred_pd = pred.as_data_frame()
sub = sub_data.as_data_frame()

sub['target'] = pred_pd
sub.to_csv('/data/avkash/PortoSeguro/PortoSeguroResult.csv', header=True, index=False)

That’s it, enjoy!!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s