I want to combine the permuted col1 and col2 values into one row. containing only the first combination & summing the count column of both. Is there an easy way to do this in pandas?
The example data frame and output: For example, in the below data frame I want to combine rows with values A, B, and B, A and sum their count column. Same for rows with values C, D, and D, C and sum their count values as well. I want to keep the rest of the rows in the data frame as is.
INPUT:
col1 | col2 | count |
---|---|---|
A | B | 3 |
C | D | 2 |
B | A | 5 |
E | F | 2 |
G | H | 8 |
D | C | 5 |
I | J | 4 |
OUTPUT:
col1 | col2 | count |
---|---|---|
A | B | 8 |
C | D | 7 |
E | F | 2 |
G | H | 8 |
I | J | 4 |
You can .groupby
according sorted col1/col2:
x = (
df.groupby(df[["col1", "col2"]].apply(lambda x: tuple(sorted(x)), 1))
.agg({"col1": "first", "col2": "first", "count": "sum"})
.reset_index(drop=True)
)
print(x)
Prints:
col1 col2 count
0 A B 8
1 C D 7
2 E F 2
3 G H 8
4 I J 4