pythonpandaspivot-table

How to prevent pd.pivot_table from unwantedly sorting columns


I have the following long df:

df = pd.DataFrame({'stations':["Toronto","Toronto","Toronto","New York","New York","New York"],'forecast_date':["Jul 30","Jul 31","Aug 1","Jul 30","Jul 31","Aug 1"],'low':[58,57,59,70,72,71],'high':[65,66,64,88,87,86]})
print(df)

I want to pivot the table to a wide df that looks like this:

Desired Output
Desired Output

so I used the following function:

df = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first").reset_index()
print(df)

But with this, I get the following df:

Output Received (Undesired)
Output Received (Undesired)

So basically pd.pivot_table seems to be sorting the columns alphabetically, whereas I want it to be sorted in chronological order.

(Note that the dates are continuously changing, so other months will have a similar problem)


Solution

  • You won't be able to prevent the sorting, but you can always enforce the original ordering by using .reindex with the unique values from the column!

    table = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first")
    
    print(
        table
    )
                   high                 low              
    forecast_date Aug 1 Jul 30 Jul 31 Aug 1 Jul 30 Jul 31
    stations                                             
    New York         86     88     87    71     70     72
    Toronto          64     65     66    59     58     57
    
    
    print(
        table.reindex(columns=df['forecast_date'].unique(), level='forecast_date')
    )
                    high                 low             
    forecast_date Jul 30 Jul 31 Aug 1 Jul 30 Jul 31 Aug 1
    stations                                             
    New York          88     87    86     70     72    71
    Toronto           65     66    64     58     57    59
    

    Note that this is different than sorting in chronological order. To do that you would have to cast to a datetime and sort on that.