pythonpandas

Concatenate rows for two columns in panda dataframe


I have the following dataframe:

import pandas as pd
d = {'Name': ['DataSource', 'DataSource'], 'DomainCode': ['Pr', 'Gov'], 'DomainName': ['Private', 'Government']}
df = pd.DataFrame(data=d)

So the dataframe is as follows:

         Name DomainCode  DomainName
0  DataSource         Pr     Private
1  DataSource        Gov  Government

I need to group it by the name to receive two lists:

         Name DomainCode    DomainName
0  DataSource [Pr, Gov]     [Private, Government]

I understand how to do it for a single column:

df = df.groupby("Name")["DomainCode"].apply(list).reset_index()

when I receive

           Name DomainCode
0  A_DataSource  [GOV, PR]

but I cannot add the second column there whatever I tried. How to do this?

One more question is that the list returned by the previous command is somehow not a list as it has a length of 1, and not two.


Solution

  • Please use the following line:

    df_grouped = df.groupby("Name").agg(list).reset_index()
    

    When you run this line

    df_grouped = df.groupby("Name")["DomainCode"].apply(list).reset_index()
    

    It returns 1 instead of 2 because Pandas is storing the list as a single string ('[Pr, Gov]') rather than a true Python list.

    For conversion to real list (from comment):

    import ast
    fake_list = "['Pr', 'Gov']"  
    real_list = ast.literal_eval(fake_list)
    print(real_list)  
    for item in real_list:
        print(item)
    

    Or:

    fake_list = "Pr, Gov"
    real_list = fake_list.split(", ")
    print(real_list)
    

    Or:

    import json
    fake_list = '["Pr", "Gov"]'
    real_list = json.loads(fake_list)
    print(real_list)
    

    Or:

    import re
    fake_list = "['Pr', 'Gov']"
    real_list = re.findall(r"'(.*?)'", fake_list)
    print(real_list)
    

    Output:

    enter image description here