I have a dataframe that has worksite info.
District# Site# Address
1 1 123 Bayview Ln
1 2 456 Example St
2 36 789 Hello Dr
2 44 789 Hello Dr
I am trying to transform this dataframe to add a column with the highest Site# as well as the distinct addresses when I group by District#. Here is an example of what I want the output to look like:
District# Site# Address MaxSite# All District Addresses
1 1 123 Bayview Ln 2 123 Bayview Ln,456 Example St
1 2 456 Example St 2 123 Bayview Ln,456 Example St
2 36 789 Hello Dr 44 789 Hello Dr
2 44 789 Hello Dr 44 789 Hello Dr
I am able to get the Max Site# by doing
df['MaxSite#'] = df.groupby(by='District#')['Site#'].transform('max')
But I am trying to find a similar way to list all of the unique addresses when I groupby District#.
I have tried doing .transform('unique')
but that is not a valid function name and doing .agg(['unique'])
returns dimensions that do not match
You can use groupby
and agg
to get the Max Site Number and List all the addresses
Then merge
back to the original dataframe:
grouped_df = df.groupby('District#').agg(Max_Site_Num=('Site#', 'max'),
All_District_Addresses=('Address', lambda x: list(set(x))).reset_index()
df = df.merge(grouped_df,on='District#')
Output:
District# Site# Address Max_Site_Num All_District_Addresses
0 1 1 123 Bayview Ln 2 [123 Bayview Ln, 456 Example St]
1 1 2 456 Example St 2 [123 Bayview Ln, 456 Example St]
2 2 36 789 Hello Dr 44 [789 Hello Dr]
3 2 44 789 Hello Dr 44 [789 Hello Dr]