Ignoring columns from an H2O data frame in python

For all kinds of data munging with H2O please follow here. The link here shows how to slice columns from a H2O data frame.

Here is the python script to show how to filter ignored columns:

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
ignore_columns = ["name", "economy", "cylinders"]
all_columns = cars.columns
predictors = set(all_columns).difference(ignore_columns)
predictors = list(predictors)
response = "economy_20mpg"
print(all_columns)
print(ignore_columns)
print(predictors)
train, valid = cars.split_frame(ratios = [.8])
cars_glm = H2OGeneralizedLinearEstimator(family = 'binomial')
cars_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

Thats it, enjoy!!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s