python-3.xpandasdatabasesorting

Custom sorting a dataframe based on multiple columns


Below is an example dataframe:

Name State Location
Jamie New York Manhattan
Mary Ohio Downtown Cleveland
John New York Flushing Queens
Sam Ohio Lakewood Cleveland
James Ohio Beachwood Cleveland
Perry New York Uptown Manhattan
Diane New York Briarwood Queens
Julie New York Lower Manhattan

I would like to sort this dataframe so that they are grouped by State and Location. In the case of Location, all nuances of Manhattan are grouped together and likewise for the others. The Location column should be custom sorted in the order of [Manhattan, Queens, Cleveland]

The result should be something like that shown below:

Name State Location
Jamie New York Manhattan
Perry New York Uptown Manhattan
Julie New York Lower Manhattan
John New York Flushing Queens
Diane New York Briarwood Queens
Mary Ohio Downtown Cleveland
Sam Ohio Lakewood Cleveland
James Ohio Beachwood Cleveland

I tried the solution posted here. But for it to work I will need to include all the possibilities in column Location. What's the best way to achieve this?


Solution

  • Try:

    # extract required areas
    df["tmp"] = df.Location.str.extract(r"(Manhattan|Queens|Cleveland)")
    
    # define order
    df["tmp"] = pd.Categorical(
        df["tmp"], categories=["Manhattan", "Queens", "Cleveland"], ordered=True
    )
    
    # sort
    df = df.sort_values(by=["State", "tmp"]).drop(columns="tmp")
    
    print(df)
    

    Prints:

        Name     State             Location
    0  Jamie  New York            Manhattan
    5  Perry  New York     Uptown Manhattan
    7  Julie  New York      Lower Manhattan
    2   John  New York      Flushing Queens
    6  Diane  New York     Briarwood Queens
    1   Mary      Ohio   Downtown Cleveland
    3    Sam      Ohio   Lakewood Cleveland
    4  James      Ohio  Beachwood Cleveland