Suppose I have a dataset (df)
Group | Employee_Title | Employee_Name
A | Manager | John
A | Analyst | Adam
A | Analyst | Smith
B | Manager | Bill
B | Analyst | Ed
B | Analyst | Jay
I want to create a new column "Group_Manager" so that the new dataset would be:
Group | Employee_Title | Employee_Name | Group_Manager
A | Manager | John | John
A | Analyst | Adam | John
A | Analyst | Smith | John
B | Manager | Bill | Bill
B | Analyst | Ed | Bill
B | Analyst | Jay | Bill
I am looking for python code that can do this in some "cumulative" way, like (not working right now) :
df['Group_Manager']=df.groupby('Group').apply(lambda Employee_Title,Employee_Name: Employee_Name if Employee_Title=="Manager" else keep previous Group_Manager)
By retrieving the manager's name for each group and then reindex it according to the main dataframe 'Group' column, you can achieve the results that you desired
import pandas as pd
# Sample data
data = {
'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)
# Create the Group_Manager column
df['Group_Manager'] = df.groupby('Group').apply(lambda g: g['Employee_Name'][g['Employee_Title'] == 'Manager'].iloc[0]).reindex(df['Group']).reset_index(drop=True)
print(df)
which results in
Group Employee_Title Employee_Name Group_Manager
0 A Manager John John
1 A Analyst Adam John
2 A Analyst Smith John
3 B Manager Bill Bill
4 B Analyst Ed Bill
5 B Analyst Jay Bill
Another approach using itertools.accumulate
import pandas as pd
import itertools
# Sample data
data = {
'Group': ['A', 'A', 'A', 'B', 'B', 'B'],
'Employee_Title': ['Manager', 'Analyst', 'Analyst', 'Manager', 'Analyst', 'Analyst'],
'Employee_Name': ['John', 'Adam', 'Smith', 'Bill', 'Ed', 'Jay']
}
df = pd.DataFrame(data)
# Create a series with manager names where title is 'Manager', and NaN elsewhere
managers = df['Employee_Name'].where(df['Employee_Title'] == 'Manager', None)
# Use accumulate to carry forward the manager names
df['Group_Manager'] = list(itertools.accumulate(managers, lambda x, y: x if y is None else y))
print(df)
which results in
Group Employee_Title Employee_Name Group_Manager
0 A Manager John John
1 A Analyst Adam John
2 A Analyst Smith John
3 B Manager Bill Bill
4 B Analyst Ed Bill
5 B Analyst Jay Bill