I have the following long df:
df = pd.DataFrame({'stations':["Toronto","Toronto","Toronto","New York","New York","New York"],'forecast_date':["Jul 30","Jul 31","Aug 1","Jul 30","Jul 31","Aug 1"],'low':[58,57,59,70,72,71],'high':[65,66,64,88,87,86]})
print(df)
I want to pivot the table to a wide df that looks like this:
Desired Output
so I used the following function:
df = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first").reset_index()
print(df)
But with this, I get the following df:
Output Received (Undesired)
So basically pd.pivot_table
seems to be sorting the columns alphabetically, whereas I want it to be sorted in chronological order.
(Note that the dates are continuously changing, so other months will have a similar problem)
You won't be able to prevent the sorting, but you can always enforce the original ordering by using .reindex
with the unique values from the column!
table = df.pivot_table(index = 'stations', columns = "forecast_date", values = ["high","low"],aggfunc = "first")
print(
table
)
high low
forecast_date Aug 1 Jul 30 Jul 31 Aug 1 Jul 30 Jul 31
stations
New York 86 88 87 71 70 72
Toronto 64 65 66 59 58 57
print(
table.reindex(columns=df['forecast_date'].unique(), level='forecast_date')
)
high low
forecast_date Jul 30 Jul 31 Aug 1 Jul 30 Jul 31 Aug 1
stations
New York 88 87 86 70 72 71
Toronto 65 66 64 58 57 59
Note that this is different than sorting in chronological order. To do that you would have to cast to a datetime
and sort on that.