pythonpandasaggregatemulti-indexmultilevel-analysis

How do I sum on the outter most level of a multi index (row)?


I am trying to figure out how to sum on the outer most level of my multi-index. So I want to sum the COUNTS column based on the individual operators, and all the shops listed for it.

df=pd.DataFrame(data.groupby('OPERATOR').SHOP.value_counts())
df=df.rename(columns={'SHOP':'COUNTS'})
df['COUNTS'] = df['COUNTS'].astype(float)
df['percentage']=df.groupby(['OPERATOR'])['COUNTS'].sum()
df['percentage']=df.sum(axis=0, level=['OPERATOR', 'SHOP'])
df.head()

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    3.0
            FF9          1.0    1.0
            IHI          1.0    1.0
Aegean      HA9          33.0   33.0
            IN9          24.0   24.0

When I use the df.sum call, it lets me call it on both levels, but then when I change it to df.sum(axis=0, level=['OPERATOR'], it results in the percentage column being NaN. I originally had the count column as int so I thought maybe that was the issue, and converted to float, but this didn't resolve the issue. This is the desired output:

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    5.0
            FF9          1.0    5.0
            IHI          1.0    5.0
Aegean      HA9          33.0   57.0
            IN9          24.0   57.0

(This is just a stepping stone on the way to calculating the percentage for each shop respective to the operator, i.e. the FINAL final output would be):

                      COUNTS    percentage
OPERATOR    SHOP        
AVIANCA     CC9          3.0    .6
            FF9          1.0    .2
            IHI          1.0    .2
Aegean      HA9          33.0   .58
            IN9          24.0   .42

So bonus points if you include the last step of that as well!! Please help me!!!


Solution

  • Group by OPERATOR and normalize your data:

    df['percentage'] = df.groupby('OPERATOR')['COUNTS'] \
                         .transform(lambda x: x / x.sum()) \
                         .round(2)
    
    >>> df
                   COUNTS  percentage
    OPERATOR SHOP
    AVIANCA  CC9      3.0        0.60
             FF9      1.0        0.20
             IHI      1.0        0.20
    Aegean   HA9     33.0        0.58
             IN9     24.0        0.42