I am working with a set of monthly averaged time-series data that spans 20+ years and have put the data into a pandas dataframe. The index of the dataframe is composed of the datetime objects that span the time range of the dataset. I have successfully created a 2D histogram subplot of both time and another parameter, proton speed. The x-axis of the histogram was created by what seems like a default action, but I'm not sure how to interpret it. I have been trying to format the x-axis using matplotlib commands, primarily the date locator/formatter functions, but they keep throwing a massive overflow error that ends with: "OverflowError: int too big to convert."
I have not been successful in finding a good solution with other questions or through the documentation.
These are the imports I have used so far:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, date, time
import matplotlib.dates as mdates
The following is the pandas dataframe that I have been using. I apologize if the formatting is weird. I wasn't sure how to share the table, so I copied the dataframe directly from my notebook. The columns should be tab delimited here.
Datetime proton_density proton_temp He4toprotons proton_speed x_dot_RTN Proton Mass Flux
----------------------------------------------------------------------------------------
1998-01-23 11.625 58930.0 0.0224 380.90 379.91 7.406307e-19
1998-02-19 9.569 64302.0 0.0294 380.99 380.23 6.097867e-19
1998-03-18 8.767 66770.0 0.0348 384.00 383.19 5.630929e-19
1998-04-14 7.410 121090.0 0.0352 448.44 446.58 5.558023e-19
1998-05-11 7.881 102230.0 0.0271 421.21 419.87 5.552362e-19
... ... ... ... ... ... ...
2021-09-19 8.244 55183.0 0.0356 384.52 383.22 5.302183e-19
2021-10-16 9.664 70601.0 0.0115 418.50 416.21 6.764725e-19
2021-11-12 6.137 93617.0 0.0256 450.47 449.30 4.624021e-19
2021-12-09 4.889 96768.0 0.0177 426.52 424.99 3.487845e-19
2022-01-05 7.280 85944.0 0.0310 434.17 433.01 5.286752e-19
Here is the code I have used to make my histogram:
ax_example = plt.subplot2grid((3, 6), (2, 1), colspan = 2)
H,xedges,yedges = np.histogram2d(SWEPAM_dataframe.index, SWEPAM_dataframe.proton_speed, bins=[50,50])
ax_example.pcolor(xedges, yedges, H.T)
ax_example.set_xlabel("Year")
ax_example.set_ylabel("Proton Speed (km/s)")
The result was this:
As you can see, the x-axis is not in datetime by default, it seems. I'm not actually sure how to interpret the default x-axis values, but that's not as important here. I have found that I should be using some combination of ax2.xaxis.set_major_locator(loc)
and ax2.xaxis.set_major_formatter(fmt)
. However, anytime I try to use these commands I get the aforementioned overflow error and am prevented from turning the x-axis of my histogram into the desired dates.
I could reproduce your issue. Why xedges
returns such high numbers (in the 10^17) has to see with how matplotlib reads datetime objects, in what unit of time since epoch.
I have been trying to make it function reliably to provide a full answer.
Also this overflow error was already reported in Set xaxis data to datetime in matplotlib without receiving a convincing answer.
Alternatively, seaborn is better than matplotlib at handling the datetime dtype in pandas dataframes without requiring further manipulations on the axes:
import seaborn as sns
# with input: (without setting `"Datetime"` as index)
df = pd.DataFrame(columns = ['Datetime','proton_density','proton_temp','He4toprotons','proton_speed','x_dot_RTN','Proton_Mass_Flux'],
data = [['1998-01-23',11.625,58930.0,0.0224,380.90,379.91,7.406307e-19],
['1998-02-19', 9.569,64302.0,0.0294,380.99,380.23,6.097867e-19],
['1998-03-18', 8.767,66770.0,0.0348,384.00,383.19,5.630929e-19],
['1998-04-14',7.410,121090.0,0.0352,448.44,446.58,5.558023e-19],
['1998-05-11',7.881,102230.0,0.0271,421.21,419.87,5.552362e-19],
['2021-09-19', 8.244,55183.0,0.0356,384.52,383.22,5.302183e-19],
['2021-10-16', 9.664,70601.0,0.0115,418.50,416.21,6.764725e-19],
['2021-11-12', 6.137,93617.0,0.0256,450.47,449.30,4.624021e-19],
['2021-12-09', 4.889,96768.0,0.0177,426.52,424.99,3.487845e-19],
['2022-01-05', 7.280,85944.0,0.0310,434.17,433.01,5.286752e-19]])
df['Datetime'] = pd.to_datetime(df['Datetime'])
This will then produce the expected 2D histogramm and axes labels:
sns.histplot(df, x="Datetime", y="proton_speed")