pythonpython-polarsaltair

Force Altair chart to display years


Using a data frame of dates and values starting from 1 Jan 2022:

import datetime as dt
import altair as alt
import polars as pl
import numpy as np

alt.renderers.enable("browser")

dates = pl.date_range(dt.date(2022, 1, 1), dt.date(2025, 1, 22), "1d", eager = True)
values = np.random.uniform(size = len(dates))
df = pl.DataFrame({"dates": dates, "values": values})
alt.Chart(df).mark_point().encode(alt.X("dates:T"), alt.Y("values:Q")).show()

correct scatter plot with year dates

But if I start the data frame from 2020 and filter it for dates > 1 Jan 2022:

dates_b = pl.date_range(dt.date(2020, 1, 1), dt.date(2025, 1, 22), "1d", eager = True)
values_b = np.random.uniform(size = len(dates_b))
df_b = pl.DataFrame({"dates": dates, "values": values})
alt.Chart(df_b.filter(pl.col("dates") > dt.date(2022, 1, 1))).mark_point().encode(alt.X("dates:T"), alt.Y("values:Q")).show()

incorrect scatter plot with missing years

How can I specify that years must be shown?

Note that I do get the right result if I filter using >= to include 1 Jan 2022, but that's besides the point. I always need years.


Solution

  • You can use labelExpr to build your own logic for setting tick labels. For example, this gives the year if the month is January and the month otherwise.

    dates_b = pl.date_range(dt.date(2020, 1, 1), dt.date(2025, 1, 22), "1d", eager=True)
    values_b = np.random.uniform(size=len(dates_b))
    df_b = pl.DataFrame({"dates": dates, "values": values})
    alt.Chart(df_b.filter(pl.col("dates") > dt.date(2022, 1, 1))).mark_point().encode(
        alt.X("dates:T").axis(
            labelExpr="timeFormat(datum.value, '%m') == '01' ? timeFormat(datum.value, '%Y') : timeFormat(datum.value, '%b')",
        ),
        alt.Y("values:Q"),
    )
    

    enter image description here