My dataframe looks like this:
| col1 | col2 | col3 |
| ---- | ---- | ---- |
| 1 | abc | txt1 |
| 1 | abc | txt2 |
| 2 | abc | txt3 |
| 1 | xyz | txt4 |
| 2 | xyz | txt5 |
I want to merge the text in col3
between rows only if the rows have the same value in col1
AND the rows have same value in col2
.
Expected result:
| col1 | col2 | col3 |
| ---- | ---- | ---------- |
| 1 | abc | txt1, txt2 |
| 2 | abc | txt3 |
| 1 | xyz | txt4 |
| 2 | xyz | txt5 |
I have used this:
df = df.groupby([df[col1], df[col2]]).aggregate({'col3': ', '.join})
Which joins the col3
correctly, but it also merges col1
and col2
into one column (list). How can I achieve the expected result while keeping 3 separate columns (col1
, col2
, col3
)?
A possible solution, which:
Performs a group-by operation using two columns, col1
and col2
, as the grouping keys.
It then aggregates the values in col3
for each group by applying a lambda function that concatenates the values into a single string, with each value separated by a comma.
(df.groupby(['col1', 'col2'], as_index=False)
.agg({'col3': lambda x: ', '. join(x)}))
Output:
col1 col2 col3
0 1 abc txt1, txt2
1 1 xyz txt4
2 2 abc txt3
3 2 xyz txt5