How would I add 1 year to a column?
I've tried using map
and apply but I failed miserably.
I also wonder why pl.date()
accepts integers while it advertises that it only accepts str
or pli.Expr
.
A small hack workaround is:
col = pl.col('date').dt
df = df.with_columns(pl.when(pl.col(column).is_not_null())
.then(pl.date(col.year() + 1, col.month(), col.day()))
.otherwise(pl.date(col.year() + 1,col.month(), col.day()))
.alias("date"))
but this won't work for months or days. I can't just add a number or I'll get a:
> thread 'thread '<unnamed>' panicked at 'invalid or out-of-range date<unnamed>',
' panicked at '/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rsinvalid or out-of-range date:', 173:/github/home/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/date.rs51
:note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Most likely because day and month cycle while year goes to infinity.
I could also do this:
df = df.with_columns(
pl.when(col.month() == 1)
.then(pl.date(col.year(), 2, col.day()))
.when(col.month() == 2)
.then(pl.date(col.year(), 3, col.day()))
.when(col.month() == 3)
.then(pl.date(col.year(), 4, col.day()))
.when(col.month() == 4)
.then(pl.date(col.year(), 5, col.day()))
.when(col.month() == 5)
.then(pl.date(col.year(), 6, col.day()))
.when(col.month() == 6)
.then(pl.date(col.year(), 7, col.day()))
.when(col.month() == 7)
.then(pl.date(col.year(), 8, col.day()))
.when(col.month() == 8)
.then(pl.date(col.year(), 9, col.day()))
.when(col.month() == 9)
.then(pl.date(col.year(), 10, col.day()))
.when(col.month() == 10)
.then(pl.date(col.year(), 11, col.day()))
.when(col.month() == 11)
.then(pl.date(col.year(), 12, col.day()))
.otherwise(pl.date(col.year() + 1, 1, 1))
.alias("valid_from")
)
Polars allows to do addition and subtraction with python's timedelta
objects. However above week units things get a bit more complicated as we have to take different days of the month and leap years into account.
For this polars has offset_by
under the dt
namespace.
(pl.DataFrame({
"dates": pl.datetime_range(pl.datetime(2000, 1, 1), pl.datetime(2026, 1, 1), "1y", eager=True)
}).with_columns(
pl.col("dates").dt.offset_by("1y").alias("dates_and_1_yr")
))
shape: (27, 2)
┌─────────────────────┬─────────────────────┐
│ dates ┆ dates_and_1_yr │
│ --- ┆ --- │
│ datetime[μs] ┆ datetime[μs] │
╞═════════════════════╪═════════════════════╡
│ 2000-01-01 00:00:00 ┆ 2001-01-01 00:00:00 │
│ 2001-01-01 00:00:00 ┆ 2002-01-01 00:00:00 │
│ 2002-01-01 00:00:00 ┆ 2003-01-01 00:00:00 │
│ 2003-01-01 00:00:00 ┆ 2004-01-01 00:00:00 │
│ 2004-01-01 00:00:00 ┆ 2005-01-01 00:00:00 │
│ … ┆ … │
│ 2022-01-01 00:00:00 ┆ 2023-01-01 00:00:00 │
│ 2023-01-01 00:00:00 ┆ 2024-01-01 00:00:00 │
│ 2024-01-01 00:00:00 ┆ 2025-01-01 00:00:00 │
│ 2025-01-01 00:00:00 ┆ 2026-01-01 00:00:00 │
│ 2026-01-01 00:00:00 ┆ 2027-01-01 00:00:00 │
└─────────────────────┴─────────────────────┘