I have the following dataframe:
import pandas as pd
d = {'Name': ['DataSource', 'DataSource'], 'DomainCode': ['Pr', 'Gov'], 'DomainName': ['Private', 'Government']}
df = pd.DataFrame(data=d)
So the dataframe is as follows:
Name DomainCode DomainName
0 DataSource Pr Private
1 DataSource Gov Government
I need to group it by the name to receive two lists:
Name DomainCode DomainName
0 DataSource [Pr, Gov] [Private, Government]
I understand how to do it for a single column:
df = df.groupby("Name")["DomainCode"].apply(list).reset_index()
when I receive
Name DomainCode
0 A_DataSource [GOV, PR]
but I cannot add the second column there whatever I tried. How to do this?
One more question is that the list returned by the previous command is somehow not a list as it has a length of 1, and not two.
Please use the following line:
df_grouped = df.groupby("Name").agg(list).reset_index()
When you run this line
df_grouped = df.groupby("Name")["DomainCode"].apply(list).reset_index()
It returns 1 instead of 2 because Pandas is storing the list as a single string ('[Pr, Gov]') rather than a true Python list.
For conversion to real list (from comment):
import ast
fake_list = "['Pr', 'Gov']"
real_list = ast.literal_eval(fake_list)
print(real_list)
for item in real_list:
print(item)
Or:
fake_list = "Pr, Gov"
real_list = fake_list.split(", ")
print(real_list)
Or:
import json
fake_list = '["Pr", "Gov"]'
real_list = json.loads(fake_list)
print(real_list)
Or:
import re
fake_list = "['Pr', 'Gov']"
real_list = re.findall(r"'(.*?)'", fake_list)
print(real_list)
Output: