[SOLVED] Python: pandas groupby two columns, without merging them

Python: pandas groupby two columns, without merging them

My dataframe looks like this:

| col1 | col2 | col3 |
| ---- | ---- | ---- |
|  1   | abc  | txt1 |
|  1   | abc  | txt2 |
|  2   | abc  | txt3 |
|  1   | xyz  | txt4 |
|  2   | xyz  | txt5 |

I want to merge the text in col3 between rows only if the rows have the same value in col1 AND the rows have same value in col2.

Expected result:

| col1 | col2 | col3       |
| ---- | ---- | ---------- |
|  1   | abc  | txt1, txt2 |
|  2   | abc  | txt3       |
|  1   | xyz  | txt4       |
|  2   | xyz  | txt5       |

I have used this:

df = df.groupby([df[col1], df[col2]]).aggregate({'col3': ', '.join})

Which joins the col3 correctly, but it also merges col1 and col2 into one column (list). How can I achieve the expected result while keeping 3 separate columns (col1, col2, col3)?

Solution

A possible solution, which:

Performs a group-by operation using two columns, col1 and col2, as the grouping keys.
It then aggregates the values in col3 for each group by applying a lambda function that concatenates the values into a single string, with each value separated by a comma.

(df.groupby(['col1', 'col2'], as_index=False)
 .agg({'col3': lambda x: ', '. join(x)}))

Output:

   col1 col2        col3
0     1  abc  txt1, txt2
1     1  xyz        txt4
2     2  abc        txt3
3     2  xyz        txt5