Have been struggling with this for some time. I have the following multi-indexed by column Dataframe:
Category Sweet
Destination NWE MED
Week W1 W2 W3 W4 W1 W2 W3 W4
2022-06-01 0.0 0.0 0.0 70.0 70.0 70.0 70.0 70.0
2022-07-01 142.0 142.0 779.0 779.0 945.0 945.0 1015.0 1015.0
2022-08-01 236.0 661.0 732.0 914.0 996.0 1044.0 1044.0 1096.0
I want to get the sum of NWE + MED columns by Week in one slice, which would look like this:
Category Sweet
Destination (NWE+MED)
Week W1 W2 W3 W4
2022-06-01 70.0 70.0 70.0 140.0
2022-07-01 1087.0 1087.0 1794.0 1794.0
2022-08-01 1232.0 1705.0 1776.0 2010.0
Will be doing it for a larger list so slicing each country then adding them together would be a solution but also very cumbersome in terms of code. This is similar to an excel pivot table where you can select several different sub categories (like countries) and you could get the sums as values. (In my case the cumulative sums).
You can try the stack()
+ (do something) + unstack()
trick to sum the columns.
stack()
lets you reshape the wide df into a long one so that you can sum()
the data by week; this creates a Series. After it's done, convert it into a frame by calling to_frame()
and call unstack()
to reshape the long frame into a wide one.
# input
df = (
pd.DataFrame({
('Sweet', 'NWE', 'W1'): {'2022-06-01': 0.0, '2022-07-01': 142.0, '2022-08-01': 236.0},
('Sweet', 'NWE', 'W2'): {'2022-06-01': 0.0, '2022-07-01': 142.0, '2022-08-01': 661.0},
('Sweet', 'NWE', 'W3'): {'2022-06-01': 0.0, '2022-07-01': 779.0, '2022-08-01': 732.0},
('Sweet', 'NWE', 'W4'): {'2022-06-01': 70.0, '2022-07-01': 779.0, '2022-08-01': 914.0},
('Sweet', 'MED', 'W1'): {'2022-06-01': 70.0, '2022-07-01': 945.0, '2022-08-01': 996.0},
('Sweet', 'MED', 'W2'): {'2022-06-01': 70.0, '2022-07-01': 945.0, '2022-08-01': 1044.0},
('Sweet', 'MED', 'W3'): {'2022-06-01': 70.0, '2022-07-01': 1015.0, '2022-08-01': 1044.0},
('Sweet', 'MED', 'W4'): {'2022-06-01': 70.0, '2022-07-01': 1015.0, '2022-08-01': 1096.0}})
.rename_axis(columns=['Category', 'Destination', 'Week'])
)
# stack the df to make it long, sum along columns, add column names and unstack back
df = df.stack().sum(axis=1).to_frame(name=('Sweet', 'NWE+MED')).unstack()
# output
df
Another option is to simply index the columns and add them together. Using tuples lets us index levels of a multi-index column.
df = df.sort_index(axis=1)
df = df[("Sweet", "NWE")] + df[("Sweet", "MED")]
df.columns = pd.MultiIndex.from_product(
(["Sweet"], ["MED+NWE"], df.columns),
names=["Category", "Destination", "Week"]
)
Both of the above options produce the following output: