Maintaining column names after applying function on data frame

Sometime when we apply a function on a data frame the column names are changed. Here is an example:

// Creating a new data frame and then converting it to H2O data frame
c_names = [‘Num’, ‘Prediction’]
data1 = np.array([[1, 0.12],
 [2, 0.43],
df = h2o.H2OFrame().from_python(data1, destination_frame=‘df’, column_names=c_names)
// Printing H2O Dataframe
print “df Columns: ”, df.colunns
// Now applying log function
df = df.log1p()
// Above function will change columns name X to log1p(X)
// If i tried df.log() then new column names will be log(X)
print "df Columns: ", df.columns

As you see above, the columns are changed so you would need to re-apply the original columns to the data frame. The way you do is to store the columns first, then apply necessary function and then re-apply previous column names to data frame as below:

column_names = df.columns
df = df.log()

Thats it, enjoy!!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s