Below is an example dataframe:
Name | State | Location |
---|---|---|
Jamie | New York | Manhattan |
Mary | Ohio | Downtown Cleveland |
John | New York | Flushing Queens |
Sam | Ohio | Lakewood Cleveland |
James | Ohio | Beachwood Cleveland |
Perry | New York | Uptown Manhattan |
Diane | New York | Briarwood Queens |
Julie | New York | Lower Manhattan |
I would like to sort this dataframe so that they are grouped by State and Location. In the case of Location, all nuances of Manhattan are grouped together and likewise for the others. The Location column should be custom sorted in the order of [Manhattan, Queens, Cleveland]
The result should be something like that shown below:
Name | State | Location |
---|---|---|
Jamie | New York | Manhattan |
Perry | New York | Uptown Manhattan |
Julie | New York | Lower Manhattan |
John | New York | Flushing Queens |
Diane | New York | Briarwood Queens |
Mary | Ohio | Downtown Cleveland |
Sam | Ohio | Lakewood Cleveland |
James | Ohio | Beachwood Cleveland |
I tried the solution posted here. But for it to work I will need to include all the possibilities in column Location. What's the best way to achieve this?
Try:
# extract required areas
df["tmp"] = df.Location.str.extract(r"(Manhattan|Queens|Cleveland)")
# define order
df["tmp"] = pd.Categorical(
df["tmp"], categories=["Manhattan", "Queens", "Cleveland"], ordered=True
)
# sort
df = df.sort_values(by=["State", "tmp"]).drop(columns="tmp")
print(df)
Prints:
Name State Location
0 Jamie New York Manhattan
5 Perry New York Uptown Manhattan
7 Julie New York Lower Manhattan
2 John New York Flushing Queens
6 Diane New York Briarwood Queens
1 Mary Ohio Downtown Cleveland
3 Sam Ohio Lakewood Cleveland
4 James Ohio Beachwood Cleveland