pythonpandasdataframeextractdata-extraction

Is there a Python code to extract relative information of the dupliacte ID's and get a merged information?


Serial Number Age Characteristics
1001 20 Tall
1002 23 Blue
1001 20 Black
1002 23 Short
1003 19 Green

Desired output:

Serial Number Age Characteristics
1001 20 Tall,Black
1002 23 Short,Blue
1003 19 Green

Solution

  • You should first use groupby then in agg use ','.join like this

    df.groupby("Serial Number").agg({"Age": "last", "Characteristics": ','.join})