pythonpandasdataframe

Having the ranking of the group/subgroup in a new column of a DataFrame


I am trying to find a clean way to create a new column in a dataframe with the ranking of the group/subgroup based on the sum of a value. Here is a simple example :

df = pd.DataFrame({
        "group": ["a", "a", "a", "a", "a", "b", "b", "b", "b", "b","c"],
        "subgroup": ["i","ii","i","ii","i","ii","i","ii","i","ii","ii"],
        "value": [2, 4, 2, 3, 5, 1, 2, 4, 1, 5, 11] })

The output I'd like to have is :

|group |subgroup| value | rank |
|------|--------|-------|------|
| "a"  | "i"    | 2     | 3    |
| "a"  | "ii"   | 4     | 4    |
| "a"  | "i"    | 2     | 3    |
| "a"  | "ii"   | 3     | 4    |
| "a"  | "i"    | 5     | 3    |
| "b"  | "ii"   | 1     | 2    |
| "b"  | "i"    | 2     | 5    |
| "b"  | "ii"   | 4     | 2    |
| "b"  | "i"    | 1     | 5    |
| "b"  | "ii"   | 5     | 2    |
| "c"  | "i"    | 11    | 1    |

because the sum of 'value' of the subgroup ["a","i"] is 9, making it the third biggest subgroup. I know it sounds easy to do, but I can't find an efficient way to do it with Pandas...


Solution

  • Use a groupby.sum, then rank and merge:

    out = df.merge(df.groupby(['group', 'subgroup'])['value']
                     .sum().rank(ascending=False)
                     .rename('rank').reset_index()
                  )
    

    Output:

       group subgroup  value  rank
    0      a        i      2   3.0
    1      a       ii      4   4.0
    2      a        i      2   3.0
    3      a       ii      3   4.0
    4      a        i      5   3.0
    5      b       ii      1   2.0
    6      b        i      2   5.0
    7      b       ii      4   2.0
    8      b        i      1   5.0
    9      b       ii      5   2.0
    10     c       ii     11   1.0
    

    Intermediates:

      group subgroup  sum  rank
    0     a        i    9   3.0
    1     a       ii    7   4.0
    2     b        i    3   5.0
    3     b       ii   10   2.0
    4     c       ii   11   1.0