Managing timestamps values with n-fold cross validation

When using cross validation with n-folds user can choose a specific column as fold columns. More details on fold columns are described below:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/fold_column.html

Using fold columns the various splits will be created into custom grouping based on numerical or categorical values into the fold column.

This is how fold column setting is used in H2O:

Screen Shot 2017-04-11 at 9.34.14 AM

 

Question: Is it wise to use time stamp values based column as fold column?

Answer: It is not advised to directly feed the timestamp values based column to the fold_column argument and it would be a wrong decision. The main reason is that there are too many unique values in this scenarios which will cause problems.

The best way to handle such scenario is to first form a new column that is based on year/month/day for the given timestamp values  and then feed this newly create column as fold column to your Cross Validation process. 

Thats it, enjoy!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s