[SOLVED] Pandas Dataframe groupby count number of rows quickly

Pandas Dataframe groupby count number of rows quickly

I have a dataframe that looks like

Class_ID  Student_ID  feature
1         4           31
1         4           86
1         4           2
1         2           11
1         2           0
5         3           2
5         9           3
5         9           2

and I would like to count the number of times a student appear in each Class_ID, so the desired outcome looks like this:

Class_ID  Student_ID  feature  count
1         4           31       3
1         4           86       3
1         4           2        3
1         2           11       2
1         2           0        2
5         3           2        1
5         9           3        2
5         9           2        2

and here's how I did it:

df['dummy'] = 1
df['count'] = df.groupby(['Class_ID', 'Student_ID'], group_keys=False)['dummy'].transform(lambda x: x.sum())

It works fine, but my actual dataframe is rather large (~ 1M rows), and the code is quite slow, so i would like to ask is there any quicker way/ better way to do it? Thanks.

Solution

You can avoid creating a dummy column and transform with "size" instead:

df["count"] = df.groupby(["Class_ID", "Student_ID"]).transform("size")

On my machine it's about 3/4x times faster for 1 million rows.

Output:

   Class_ID  Student_ID  feature  count
0         1           4       31      3
1         1           4       86      3
2         1           4        2      3
3         1           2       11      2
4         1           2        0      2
5         5           3        2      1
6         5           9        3      2
7         5           9        2      2