pythonpandas

How to add a row for sorted multi-index dataframe?


I have a multiindex dataframe, which comes from groupby.

Here is a demo:

In [54]: df = pd.DataFrame({'color': ['blue', 'grey', 'blue', 'grey', 'black'], 'name': ['pen', 'pen', 'pencil', 'pencil', 'box'],'price':[2.5, 2.3, 1.5, 1.3, 5.2],'bprice':[2.2, 2, 1.3, 1.2, 5.0]})

In [55]: df
Out[55]: 
   color    name  price  bprice
0   blue     pen    2.5     2.2
1   grey     pen    2.3     2.0
2   blue  pencil    1.5     1.3
3   grey  pencil    1.3     1.2
4  black     box    5.2     5.0

In [56]: a = df.groupby(['color', 'name'])[['price', 'bprice']].sum()

In [57]: a
Out[57]: 
              price  bprice
color name                 
black box       5.2     5.0
blue  pen       2.5     2.2
      pencil    1.5     1.3
grey  pen       2.3     2.0
      pencil    1.3     1.2

I want to add a row in every color index, the ideal output is:

              price  bprice
color name                 
black *         5.2     5.0
      box       5.2     5.0
blue  *         4.0     3.5
      pen       2.5     2.2
      pencil    1.5     1.3
grey  *         3.6     3.2
      pen       2.3     2.0
      pencil    1.3     1.2

There are two requirements:

  1. The new * row should be in the first row of each group
  2. expect for the * row, the other row should be sorted by price

I tried a lot of methods, but not find a elegant method. Insert a row into a multiindex dataframe with specified position seems be hard.

could you help on this?


Solution

  • Compute a groupby.sum on a, then append a level with * and concat, finally sort_index based on color:

    # compute the sum per color/name
    # sort by descending price
    a = (df.groupby(['color', 'name'])[['price', 'bprice']].sum()
           .sort_values(by='price', ascending=False)
        )
    
    # compute the sum per color
    # concatenate, sort_index in a stable way
    out = (pd.concat([a.groupby('color').sum()
                       .assign(name='*')
                       .set_index('name', append=True),
                      a])
             .sort_index(level='color', kind='stable',
                         sort_remaining=False)
          )
    

    Output:

                  price  bprice
    color name                 
    black *         5.2     5.0
          box       5.2     5.0
    blue  *         4.0     3.5
          pen       2.5     2.2
          pencil    1.5     1.3
    grey  *         3.6     3.2
          pen       2.3     2.0
          pencil    1.3     1.2