I want to read in a polars dataframe from a json string containing dates in the standard iso-format "yyyy-mm-dd".
When I try to read the string in and set the dtype of the date column witheither schema
or schema_override
this results in only NULL values.
from datetime import datetime, timedelta
from io import StringIO
import polars as pl
# Generate a list of dates
start_date = datetime.today()
dates = [start_date + timedelta(days=i) for i in range(100)]
date_strings = [date.strftime("%Y-%m-%d") for date in dates]
# Create a Polars DataFrame
df = pl.DataFrame({"dates": date_strings})
df_reread = pl.read_json(
StringIO(df.write_json()),
schema_overrides={"dates": pl.Date},
)
output of print(df_reread)
shape: (100, 1)
┌───────┐
│ dates │
│ --- │
│ date │
╞═══════╡
│ null │
│ null │
│ null │
│ null │
│ null │
│ … │
│ null │
│ null │
│ null │
│ null │
│ null │
└───────┘
Is there anyway to correctly read in the Date dtype from a json string?
After having a bit of a play around, it looks like unfortunately dates being read from a JSON file have a bit of a quirk. It seems to me that currently they must be written in days since the unix epoch (which is how Polars internally represents dates) for things to work as you expect.
I have raised this feature request on their github to hopefully get that improved.
In the mean time,
df = (
pl.DataFrame({"dates": "2024-01-01"})
# add this line below
.select(pl.col("dates").cast(pl.Date).dt.epoch("d"))
)
df_reread = pl.read_json(
df.write_json().encode(),
schema_overrides={"dates": pl.Date},
)
print(df_reread)
# shape: (1, 1)
# ┌────────────┐
# │ dates │
# │ --- │
# │ date │
# ╞════════════╡
# │ 2024-01-01 │
# └────────────┘
or do as you say with df_reread.with_columns(pl.col("dates").cast(pl.Date)