Python groupBy example with H2O

Here is the code snipped where how to perform a function on the grouped by values on a particular column:

> df = h2o.import_file(“/Users/avkashchauhan/prostate.csv”)

> df.col_names

[u’ID’, u’CAPSULE’, u’AGE’, u’RACE’, u’DPROS’, u’DCAPS’, u’PSA’, u’VOL’, u’GLEASON’]

> df

 ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON
 1 0 65 1 2 1 1.4 0 6
 2 0 72 1 3 2 6.7 0 7
 3 0 70 1 1 2 4.9 0 6
 4 0 76 2 2 1 51.2 20 7
 5 0 69 1 1 1 12.3 55.9 6
 6 1 71 1 3 2 3.3 0 8
 7 0 68 2 4 2 31.9 0 7
 8 0 61 2 4 2 66.7 27.2 7
 9 0 69 1 1 1 3.9 24 7
 10 0 68 2 1 2 13 0 6

> print(df[‘GLEASON’].unique().shape)

(7,1)

> df[‘GLEASON’].unique()
C1

8

0

6

9
7
4
5
> x = df.group_by(by=['GLEASON'])
> y = x.sum(col="DCAPS",na="all").get_frame()
> y.shape(7, 2)
> y
GLEASON sum_DCAPS
0 2
4 1
5 67
6 147
7 146
8 40
9 18

 

That’s it, enjoy!!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s