pythonpandasdataframesorting

Sort panda data frame on multiple columns, where ASC/DESC is based on condition


I have a pandas dataframe:

             city  crime  pollution  ...  housing  weather  Similarity
0      Atlanta GA    101         97  ...      119       60        2.75
2    Baltimore MD    103        126  ...      103       80        2.50
4  San Antonio TX     98        126  ...      100       65        2.50
1        Akron OH    100        119  ...       97       70        1.75
3   Charleston SC    106        127  ...       82       75        1.50

I want to first sort based on Similarity, descending. Got that code:

df.sort_values('Similarity', ascending=False)

But if there is a tie, I want to sort based on a user input list. Of the 8 columns in my data, users can add UP TO 5 to the preference list, which is stored in order of adding it:

userPrefs = [] 
# ex: userPrefs = ['crime','education','weather','pollution']

So if there is a tie (such as between df[2] and df[4]), I need the code to then sort based on the list. Got that code:

df.sort_values(userPrefs[0], ascending=False).sort_values(userPrefs[1], ascending=False) # -- etc...
  .sort_values('Similarity', ascending=False)

The issue I'm facing is that depending on the column added to userPrefs, the sorting may need to be ascending or it may need to be descending. So if 'crime' == userPref[0], I want it to be ASC (lowest crime is best); however, if 'education' == userPref[0], I want to sort DESC (higher education is best).

How do I sort based on a condition? I was thinking :

ascList = ['crime','housing','pollution'] # the lower, the better
descList = ['education'] # the higher, the better

df.sort_values(userPrefs[0], ascending= if x in ascList)

..but that doesn't work and I'm not clear on lambda functions. Thanks!


Solution

  • You don't need lambda functions, create parameters by= and ascending= of the function .sort_values() on the fly:

    # use sets here:
    ascList = {"crime", "housing", "pollution"}
    descList = {"education"}
    
    # example userPrefs:
    userPrefs = ["crime", "weather", "pollution"]
    
    
    df = df.sort_values(
        by=["Similarity", *userPrefs], ascending=[False, *[p in ascList for p in userPrefs]]
    )
    
    print(df)
    

    Prints:

                 city  crime  pollution  housing  weather  Similarity
    0      Atlanta GA    101         97      119       60        2.75
    4  San Antonio TX     98        126      100       65        2.50
    2    Baltimore MD    103        126      103       80        2.50
    1        Akron OH    100        119       97       70        1.75
    3   Charleston SC    106        127       82       75        1.50