pythonpandas

Python: pandas groupby two columns, without merging them


My dataframe looks like this:

| col1 | col2 | col3 |
| ---- | ---- | ---- |
|  1   | abc  | txt1 |
|  1   | abc  | txt2 |
|  2   | abc  | txt3 |
|  1   | xyz  | txt4 |
|  2   | xyz  | txt5 |

I want to merge the text in col3 between rows only if the rows have the same value in col1 AND the rows have same value in col2.

Expected result:

| col1 | col2 | col3       |
| ---- | ---- | ---------- |
|  1   | abc  | txt1, txt2 |
|  2   | abc  | txt3       |
|  1   | xyz  | txt4       |
|  2   | xyz  | txt5       |

I have used this:

df = df.groupby([df[col1], df[col2]]).aggregate({'col3': ', '.join})

Which joins the col3 correctly, but it also merges col1 and col2 into one column (list). How can I achieve the expected result while keeping 3 separate columns (col1, col2, col3)?


Solution

  • A possible solution, which:

    (df.groupby(['col1', 'col2'], as_index=False)
     .agg({'col3': lambda x: ', '. join(x)}))
    

    Output:

       col1 col2        col3
    0     1  abc  txt1, txt2
    1     1  xyz        txt4
    2     2  abc        txt3
    3     2  xyz        txt5