[SOLVED] Having the ranking of the group/subgroup in a new column of a DataFrame

Having the ranking of the group/subgroup in a new column of a DataFrame

I am trying to find a clean way to create a new column in a dataframe with the ranking of the group/subgroup based on the sum of a value. Here is a simple example :

df = pd.DataFrame({
        "group": ["a", "a", "a", "a", "a", "b", "b", "b", "b", "b","c"],
        "subgroup": ["i","ii","i","ii","i","ii","i","ii","i","ii","ii"],
        "value": [2, 4, 2, 3, 5, 1, 2, 4, 1, 5, 11] })

The output I'd like to have is :

|group |subgroup| value | rank |
|------|--------|-------|------|
| "a"  | "i"    | 2     | 3    |
| "a"  | "ii"   | 4     | 4    |
| "a"  | "i"    | 2     | 3    |
| "a"  | "ii"   | 3     | 4    |
| "a"  | "i"    | 5     | 3    |
| "b"  | "ii"   | 1     | 2    |
| "b"  | "i"    | 2     | 5    |
| "b"  | "ii"   | 4     | 2    |
| "b"  | "i"    | 1     | 5    |
| "b"  | "ii"   | 5     | 2    |
| "c"  | "i"    | 11    | 1    |

because the sum of 'value' of the subgroup ["a","i"] is 9, making it the third biggest subgroup. I know it sounds easy to do, but I can't find an efficient way to do it with Pandas...

Solution

Use a groupby.sum, then rank and merge:

out = df.merge(df.groupby(['group', 'subgroup'])['value']
                 .sum().rank(ascending=False)
                 .rename('rank').reset_index()
              )

Output:

   group subgroup  value  rank
0      a        i      2   3.0
1      a       ii      4   4.0
2      a        i      2   3.0
3      a       ii      3   4.0
4      a        i      5   3.0
5      b       ii      1   2.0
6      b        i      2   5.0
7      b       ii      4   2.0
8      b        i      1   5.0
9      b       ii      5   2.0
10     c       ii     11   1.0

Intermediates:

  group subgroup  sum  rank
0     a        i    9   3.0
1     a       ii    7   4.0
2     b        i    3   5.0
3     b       ii   10   2.0
4     c       ii   11   1.0