[SOLVED] Concatenate values under into a new column

Concatenate values under into a new column

Consider this df:

    data = {'ID': [1071.0, 1072.0, nan, 1074.0, 1076.0, nan, nan, nan, 1077.0], 
'Name Type': ['Primary Name', 'Primary Name', 'Also Known As', 'Primary Name', 'Primary Name', 'Low Quality AKA', 'Low Quality AKA', 'Low Quality AKA', 'Primary Name'], 
'Surname': ['Brown', 'Red', 'R', 'Green', 'Purple', 'Pipi', 'Poopa', 'Peep', 'Orange']}

There are many more columns that have info in the rows with Primary Name but empty with akas. I need to concatenate values under each Primary Name - Surname if they are Low Quality AKA or Also Known as and achieve this dataframe:

Solution

This is not how I would recommend setting up the data. I would probably have two datasets with A matching ID and then Left Join the table. That being said, here is the code for your situation.

import pandas as pd
import numpy as np

data = {'ID': [1071.0, 1072.0, np.nan, 1074.0, 1076.0, np.nan, np.nan, np.nan, 1077.0], 
        'Name Type': ['Primary Name', 'Primary Name', 'Also Known As', 'Primary Name', 'Primary Name', 'Low Quality AKA', 'Low Quality AKA', 'Low Quality AKA', 'Primary Name'], 
        'Surname': ['Brown', 'Red', 'R', 'Green', 'Purple', 'Pipi', 'Poopa', 'Peep', 'Orange']}

df = pd.DataFrame(data)

# Forward fill the IDs and keep only rows with 'Primary Name'
df['ID'] = df['ID'].ffill()
primary_names = df[df['Name Type'] == 'Primary Name'].copy()

# Group by ID and aggregate the surnames for AKA names
akas = df[df['Name Type'].str.contains('AKA')].groupby('ID')['Surname'].agg(lambda x: ';'.join(x)).reset_index()
akas.rename(columns={'Surname': 'AKAs'}, inplace=True)

# Combine data
result = pd.merge(primary_names, akas, on='ID', how='left')

# Put empty string in AKAs col
result['AKAs'] = result['AKAs'].fillna('')

# Select and reorder columns to match the desired output
result = result[['ID', 'Name Type', 'Surname', 'AKAs']]

print(result)