Splitting h2o data frame based on date time value

Sometime we may need to split the data frame based on date time values i.e. one split is above certain date and another split is after certain date.

Here is an example of the python code on how to split it:

import datetime
timedata = h2o.import_file("/Users/avkashchauhan/Downloads/date-data.csv")
timedata.shape
date_before_data = timedata[timedata['date'] < datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_after_data = timedata[timedata['date'] >= datetime.datetime(2015, 10, 1, 0, 0, 0),:]
date_before_data.shape
date_after_data.shape

If you decide to split one piece of data frame and then add one of the split to previous data frame you can do the following:

part1, part2 = date_after_data.split_frame(ratios=[0.5])
final_data = date_before_data.rbind(part2)

Note the CSV file contents are as below:

id date
1 9/1/2015
2 9/2/2015
3 9/3/2015
4 9/4/2015
5 9/5/2015
6 9/6/2015
7 9/7/2015
8 9/8/2015
9 9/9/2015
10 9/10/2015
11 12/1/2015
12 12/2/2015
13 12/3/2015
14 12/4/2015
15 12/5/2015
16 12/6/2015
17 12/7/2015
18 12/8/2015
19 12/9/2015
20 12/10/2015

Thats it, enjoy!!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s