I have the following two DataFrames:
df
100 101 102 103 104 105 106 107 108 109
0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
2 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
8 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
9 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
df2
crit1 crit2
110 a A
109 a B
108 a A
107 b B
106 b A
105 a A
104 a B
103 a A
102 b B
101 b A
100 b A
99 b A
df
contains data for ten entities 100-109 and df2
describes two criteria categorizing the ten entities 100-109 (and others, in a different order). I'd like to group df
on a two-level column index (crit1,crit2) with one value per combination of (crit1,crit2), being the sum of all columns with this combination.
For example, the new column with the index ('a','A') would contain the sum of columns [108,105,103].
expected result:
crit1 a b
crit2 A B A B
0 3.0 2.0 3.0 2.0
1 19.0 15.0 10.0 11.0
2 3.0 2.0 3.0 2.0
3 3.0 2.0 3.0 2.0
4 3.0 2.0 3.0 2.0
5 3.0 2.0 3.0 2.0
6 3.0 2.0 3.0 2.0
7 3.0 2.0 3.0 2.0
8 3.0 2.0 3.0 2.0
9 3.0 2.0 3.0 2.0
To reproduce the DataFrames:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.ones((10,10)), index=np.arange(10), columns=np.arange(100,110))
df2 = pd.DataFrame(np.array([['a','A'],['a','B'],['a','A'],['b','B'],['b','A'],['a','A'],['a','B'],['a','A'],['b','B'],['b','A'],['b','A'],['b','A']]), index=np.arange(110,98,-1), columns=['crit1','crit2'])
df.iloc[1] = np.arange(1,11)
You can reindex
, set_axis
with MultiIndex.from_frame
then groupby.sum
:
out = (df.reindex(columns=df2.index)
.set_axis(pd.MultiIndex.from_frame(df2), axis=1)
.groupby(axis=1, level=[0, 1]).sum()
)
For the latest version of pandas, using the transpose
as intermediate:
out = (df.T
.reindex(df2.index)
.set_axis(pd.MultiIndex.from_frame(df2))
.groupby(level=[0, 1]).sum().T
)
Output:
crit1 a b
crit2 A B A B
0 4.0 2.0 2.0 2.0
1 4.0 2.0 2.0 2.0
2 4.0 2.0 2.0 2.0
3 4.0 2.0 2.0 2.0
4 4.0 2.0 2.0 2.0
5 4.0 2.0 2.0 2.0
6 4.0 2.0 2.0 2.0
7 4.0 2.0 2.0 2.0
8 4.0 2.0 2.0 2.0
9 4.0 2.0 2.0 2.0
Alternatively, pre-aggregating the indices, then building the output with concat
:
groups = df2.reset_index().groupby(list(df2))['index'].agg(list)
out = pd.concat({k: df.reindex(columns=lst).sum(axis=1)
for k, lst in groups.items()}, axis=1)
Output:
a b
A B A B
0 4.0 2.0 2.0 2.0
1 4.0 2.0 2.0 2.0
2 4.0 2.0 2.0 2.0
3 4.0 2.0 2.0 2.0
4 4.0 2.0 2.0 2.0
5 4.0 2.0 2.0 2.0
6 4.0 2.0 2.0 2.0
7 4.0 2.0 2.0 2.0
8 4.0 2.0 2.0 2.0
9 4.0 2.0 2.0 2.0
Intermediate groups
:
crit1 crit2
a A [100, 102, 105, 107]
B [101, 106]
b A [104, 109]
B [103, 108]
Name: index, dtype: object