Does anyone know how to parse YYYY Week into a date column in Polars?
I have tried this code but it throws an error.
import polars as pl
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col("week").cast(pl.String).str.to_date("%Y%U").alias("date"))
InvalidOperationError: conversion from `str` to `date` failed in column 'week'
for 6 out of 6 values: ["201901", "201902", … "201944"]
This seems like a bug (although one with the underlying rust package chrono rather than polars itself). I tried using base python's strptime and it ignores the %U
and just gives the first of the year for all cases so you can either do string manipulation and math like this (assuming you don't need an exact response)
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}) \
.with_columns(pl.col('week').cast(pl.Utf8)) \
.with_columns([pl.col('week').str.slice(0,4).cast(pl.Int32).alias('year'),
pl.col('week').str.slice(4,2).cast(pl.Int32).alias('week')]) \
.select(pl.date(pl.col('year'),1,1) + pl.duration(days=(pl.col('week')-1)*7).alias('date'))
If you look at the definition of %U, it's supposed to be based the xth Sunday of the year whereas my math is just multiplying by 7.
Another approach is to make a df of dates, then make the strftime of them and then join the dfs. So that might be like this:
dfdates=pl.DataFrame({'date':pl.date_range(datetime(2019,1,1), datetime(2019,12,31),'1d').cast(pl.Date())}) \
.with_columns(pl.col('date').dt.strftime("%Y%U").alias('week')) \
.groupby('week').agg(pl.col('date').min())
And then joining it with what you have
pl.DataFrame({
"week": [201901, 201902, 201903, 201942, 201943, 201944]
}).with_columns(pl.col('week').cast(pl.Utf8())).join(dfdates, on='week')
shape: (6, 2)
┌────────┬────────────┐
│ week ┆ date │
│ --- ┆ --- │
│ str ┆ date │
╞════════╪════════════╡
│ 201903 ┆ 2019-01-20 │
│ 201944 ┆ 2019-11-03 │
│ 201902 ┆ 2019-01-13 │
│ 201943 ┆ 2019-10-27 │
│ 201942 ┆ 2019-10-20 │
│ 201901 ┆ 2019-01-06 │
└────────┴────────────┘