Here is the code snipped where how to perform a function on the grouped by values on a particular column:
> df = h2o.import_file(“/Users/avkashchauhan/prostate.csv”)
> df.col_names
[u’ID’, u’CAPSULE’, u’AGE’, u’RACE’, u’DPROS’, u’DCAPS’, u’PSA’, u’VOL’, u’GLEASON’]
> df
ID CAPSULE AGE RACE DPROS DCAPS PSA VOL GLEASON 1 0 65 1 2 1 1.4 0 6 2 0 72 1 3 2 6.7 0 7 3 0 70 1 1 2 4.9 0 6 4 0 76 2 2 1 51.2 20 7 5 0 69 1 1 1 12.3 55.9 6 6 1 71 1 3 2 3.3 0 8 7 0 68 2 4 2 31.9 0 7 8 0 61 2 4 2 66.7 27.2 7 9 0 69 1 1 1 3.9 24 7 10 0 68 2 1 2 13 0 6
> print(df[‘GLEASON’].unique().shape)
(7,1)
> df[‘GLEASON’].unique()
C1
8
0
6
9 7 4 5 > x = df.group_by(by=['GLEASON']) > y = x.sum(col="DCAPS",na="all").get_frame() > y.shape(7, 2) > y GLEASON sum_DCAPS 0 2 4 1 5 67 6 147 7 146 8 40 9 18
That’s it, enjoy!!