I am trying to figure out how to sum on the outer most level of my multi-index. So I want to sum the COUNTS column based on the individual operators, and all the shops listed for it.
df=pd.DataFrame(data.groupby('OPERATOR').SHOP.value_counts())
df=df.rename(columns={'SHOP':'COUNTS'})
df['COUNTS'] = df['COUNTS'].astype(float)
df['percentage']=df.groupby(['OPERATOR'])['COUNTS'].sum()
df['percentage']=df.sum(axis=0, level=['OPERATOR', 'SHOP'])
df.head()
COUNTS percentage
OPERATOR SHOP
AVIANCA CC9 3.0 3.0
FF9 1.0 1.0
IHI 1.0 1.0
Aegean HA9 33.0 33.0
IN9 24.0 24.0
When I use the df.sum call, it lets me call it on both levels, but then when I change it to df.sum(axis=0, level=['OPERATOR'], it results in the percentage column being NaN. I originally had the count column as int so I thought maybe that was the issue, and converted to float, but this didn't resolve the issue. This is the desired output:
COUNTS percentage
OPERATOR SHOP
AVIANCA CC9 3.0 5.0
FF9 1.0 5.0
IHI 1.0 5.0
Aegean HA9 33.0 57.0
IN9 24.0 57.0
(This is just a stepping stone on the way to calculating the percentage for each shop respective to the operator, i.e. the FINAL final output would be):
COUNTS percentage
OPERATOR SHOP
AVIANCA CC9 3.0 .6
FF9 1.0 .2
IHI 1.0 .2
Aegean HA9 33.0 .58
IN9 24.0 .42
So bonus points if you include the last step of that as well!! Please help me!!!
Group by OPERATOR
and normalize your data:
df['percentage'] = df.groupby('OPERATOR')['COUNTS'] \
.transform(lambda x: x / x.sum()) \
.round(2)
>>> df
COUNTS percentage
OPERATOR SHOP
AVIANCA CC9 3.0 0.60
FF9 1.0 0.20
IHI 1.0 0.20
Aegean HA9 33.0 0.58
IN9 24.0 0.42