pythonpandasdataframematplotlibplot

Plotting a timeseries as bar plot with pandas results in an incorrect year


I have the following dataframe (except my actual data is over 25 years):

import pandas as pd


df = pd.DataFrame(
    dict(
        date=pd.date_range(start="2020-01-01", end="2020-12-31", freq="MS"),
        data=[1,2,3,4,5,6,7,8,9,10,11,12]
    ), 
)
df

Output:

    date    data
0   2020-01-01  1
1   2020-02-01  2
2   2020-03-01  3
3   2020-04-01  4
4   2020-05-01  5
5   2020-06-01  6
6   2020-07-01  7
7   2020-08-01  8
8   2020-09-01  9
9   2020-10-01  10
10  2020-11-01  11
11  2020-12-01  12

And I get different results with matplotlib and pandas default plotting:

import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt


fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("ac;bd")

ax = axs["a"]
ax.bar(x="date", height="data", data=df, width=15)

ax = axs["b"]
ax.bar(x="date", height="data", data=df, width=15)

locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)

ax = axs["c"]
df.plot.bar(x="date", y="data", ax=ax, legend=False)

ax = axs["d"]

df.plot.bar(x="date", y="data", ax=ax, legend=False, ) # incorrect year -> 1970 instead of 2020

locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
formatter = mdates.ConciseDateFormatter(locator)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)

for k, ax in axs.items():
    for label in ax.get_xticklabels():
        label.set_rotation(40)
        label.set_horizontalalignment('right')

fig

Output:

enter image description here

I would like to be able to use pandas for plotting but then format the ticks appropriately for a publication ready plot. However, it appears that I lose the date time information or get the incorrect year when using pandas.

Is there a way to format the axis ticklabels using mdates features without using the data directly? i.e. if I resample the data, or slice in a different year, I'd like the axis to reflect that automatically.


Here's a more simple illustration of the issue I'm having:

import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("a")

ax = axs["a"]

df.plot.bar(x="date", y="data", ax=ax, legend=False) # incorrect year -> 1970 instead of 2020

formatter = mdates.DateFormatter("%Y - %b")
ax.xaxis.set_major_formatter(formatter)

fig

enter image description here

The dates are all wrong when using DateFormatter.


Solution

  • When you are using a bar plot, the x-coordinates become 0, 1, 2, 3, etc. That's why mdates.DateFormatter returns 1970, as it treats these coordinates as seconds since epoch time.

    You can set the tick labels manually:

    ax.set_xticklabels(df["date"].dt.strftime("%Y - %b"))