pythonpandasspatial-data-frame

Python Group by multiple columns and keep other columns


I have a table that looks like below:

City_code     City_name     Site_code      Site_capacity
AAA100      City_A          Site001         300
AAA100      City_A          Site002         600
AAA100      City_A          Site003         500
AAA200      City_B          Site004         350
AAA200      City_B          Site005         250
AAA300      City_C          Site006         800
AAA300      City_C          Site007         150
AAA300      City_C          Site008         450
AAA400      City_D          Site009         300
AAA400      City_D          Site0010        400

I want to select the site with with the Highest value for Site_capacity for each city

I have tried the following code:

df.groupby(['City_code', 'City_name'])['Site_capacity'].max()

This this the output it generates:

City_code     City_name     
AAA100      City_A          600
AAA200      City_B          350
AAA300      City_C          800
AAA400      City_D          400

How do i create some output that looks like this?

City_code     City_name     Site_code      Site_capacity
AAA100      City_A          Site002         600
AAA200      City_B          Site004         350
AAA300      City_C          Site006         800
AAA400      City_D          Site0010        400

Solution

  • We can do sort_values + drop_duplicates

    s = df.sort_values('Site_capacity').drop_duplicates(['City_code', 'City_name'],keep='last')
    Out[334]: 
      City_code City_name Site_code  Site_capacity
    3    AAA200    City_B   Site004            350
    9    AAA400    City_D  Site0010            400
    1    AAA100    City_A   Site002            600
    5    AAA300    City_C   Site006            800