pythonpandasdataframecolumnsorting

Reordering Pandas Columns based on Column name


I have columns with similar names but numeric suffixes that represent different occurrences of each column. For example, I have columns (company_1, job_title_1, location_1, company_2, job_title_2, location_2). I would like to order these columns grouped together by the prefix (before the underscore) and then sequentially by the suffix (after the underscore).

How I would like the columns to be: company_1, company_2, job_title_1, job_title_2, location_1, location_2.

Here's what I tried from this question:

df = df.reindex(sorted(df.columns), axis=1)

This resulted in the order: company_1, company_10, company_11 (skipping over 2-9)


Solution

  • This type of sorting is called natural sorting. (There are more details in Naturally sorting Pandas DataFrame which demonstrates how to sort rows using natsort)

    Setup with natsort

    import pandas as pd
    from natsort import natsorted
    
    df = pd.DataFrame(columns=[f'company_{i}' for i in [5, 2, 3, 4, 1, 10]])
    
    # Before column sort
    print(df)
    
    df = df.reindex(natsorted(df.columns), axis=1)
    
    # After column sort
    print(df)
    

    Before sort:

    Empty DataFrame
    Columns: [company_5, company_2, company_3, company_4, company_1, company_10]
    Index: []
    

    After sort:

    Empty DataFrame
    Columns: [company_1, company_2, company_3, company_4, company_5, company_10]
    Index: []
    

    Compared to lexicographic sorting with sorted:

    df = df.reindex(sorted(df.columns), axis=1)
    
    Empty DataFrame
    Columns: [company_1, company_10, company_2, company_3, company_4, company_5]
    Index: []