If you into competitive machine learning you must be visiting Kaggle routinely. Currently you can compete for cash and recognition at the Porto Seguro’s Safe Driver Prediction as well.
I did try to given training dataset (as it is) with H2O AutoML which ran for about 5 hours and I was able to get into top 280th position. If you could transform the dataset properly and run H2O AutoML you may be able to get even higher ranking.
Following is the simplest H2O AutoML python script which you can try as well (Note: Make sure to change the run_automl_for_seconds to the desired time you would want to run the experiment.)
import h2o import pandas as pd from h2o.automl import H2OAutoML h2o.init() train = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTrain.csv') test = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroTest.csv') sub_data = h2o.import_file('/data/avkash/PortoSeguro/PortoSeguroSample_submission.csv') y = 'target' x = train.columns x.remove(y) ## Time to run the experiment run_automl_for_seconds = 18000 ## Running AML for 4 Hours aml = H2OAutoML(max_runtime_secs =run_automl_for_seconds) train_final, valid = train.split_frame(ratios=[0.9]) aml.train(x=x, y =y, training_frame=train_final, validation_frame=valid) leader_model = aml.leader pred = leader_model.predict(test_data=test) pred_pd = pred.as_data_frame() sub = sub_data.as_data_frame() sub['target'] = pred_pd sub.to_csv('/data/avkash/PortoSeguro/PortoSeguroResult.csv', header=True, index=False)
That’s it, enjoy!!