Consider this df:
data = {'ID': [1071.0, 1072.0, nan, 1074.0, 1076.0, nan, nan, nan, 1077.0],
'Name Type': ['Primary Name', 'Primary Name', 'Also Known As', 'Primary Name', 'Primary Name', 'Low Quality AKA', 'Low Quality AKA', 'Low Quality AKA', 'Primary Name'],
'Surname': ['Brown', 'Red', 'R', 'Green', 'Purple', 'Pipi', 'Poopa', 'Peep', 'Orange']}
There are many more columns that have info in the rows with Primary Name but empty with akas. I need to concatenate values under each Primary Name - Surname if they are Low Quality AKA or Also Known as and achieve this dataframe:
This is not how I would recommend setting up the data. I would probably have two datasets with A matching ID and then Left Join the table. That being said, here is the code for your situation.
import pandas as pd
import numpy as np
data = {'ID': [1071.0, 1072.0, np.nan, 1074.0, 1076.0, np.nan, np.nan, np.nan, 1077.0],
'Name Type': ['Primary Name', 'Primary Name', 'Also Known As', 'Primary Name', 'Primary Name', 'Low Quality AKA', 'Low Quality AKA', 'Low Quality AKA', 'Primary Name'],
'Surname': ['Brown', 'Red', 'R', 'Green', 'Purple', 'Pipi', 'Poopa', 'Peep', 'Orange']}
df = pd.DataFrame(data)
# Forward fill the IDs and keep only rows with 'Primary Name'
df['ID'] = df['ID'].ffill()
primary_names = df[df['Name Type'] == 'Primary Name'].copy()
# Group by ID and aggregate the surnames for AKA names
akas = df[df['Name Type'].str.contains('AKA')].groupby('ID')['Surname'].agg(lambda x: ';'.join(x)).reset_index()
akas.rename(columns={'Surname': 'AKAs'}, inplace=True)
# Combine data
result = pd.merge(primary_names, akas, on='ID', how='left')
# Put empty string in AKAs col
result['AKAs'] = result['AKAs'].fillna('')
# Select and reorder columns to match the desired output
result = result[['ID', 'Name Type', 'Surname', 'AKAs']]
print(result)