I have columns with similar names but numeric suffixes that represent different occurrences of each column. For example, I have columns (company_1, job_title_1, location_1, company_2, job_title_2, location_2). I would like to order these columns grouped together by the prefix (before the underscore) and then sequentially by the suffix (after the underscore).
How I would like the columns to be: company_1, company_2, job_title_1, job_title_2, location_1, location_2.
Here's what I tried from this question:
df = df.reindex(sorted(df.columns), axis=1)
This resulted in the order: company_1, company_10, company_11 (skipping over 2-9)
This type of sorting is called natural sorting. (There are more details in Naturally sorting Pandas DataFrame which demonstrates how to sort rows using natsort)
Setup with natsort
import pandas as pd
from natsort import natsorted
df = pd.DataFrame(columns=[f'company_{i}' for i in [5, 2, 3, 4, 1, 10]])
# Before column sort
print(df)
df = df.reindex(natsorted(df.columns), axis=1)
# After column sort
print(df)
Before sort:
Empty DataFrame
Columns: [company_5, company_2, company_3, company_4, company_1, company_10]
Index: []
After sort:
Empty DataFrame
Columns: [company_1, company_2, company_3, company_4, company_5, company_10]
Index: []
Compared to lexicographic sorting with sorted
:
df = df.reindex(sorted(df.columns), axis=1)
Empty DataFrame
Columns: [company_1, company_10, company_2, company_3, company_4, company_5]
Index: []