I want to declare a start and end date variable (date/datetime dtype, preferably using Polars), and then to run a while loop where the condition is that startdate <= enddate (each loop raising the value of startdate by one day).
Here's my attempt:
import polars as pl
startdate = pl.date(year=2023, month=1, day=1)
enddate = pl.date(year=2023, month=1, day=10)
frequency = pl.duration(days=1)
i=0
while startdate <= enddate:
print(i)
i += 1
startdate += frequency
Output:
ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.
... and here's how I would do this in Pandas:
import pandas as pd
startdate = pd.to_datetime('2023-01-01')
enddate = pd.to_datetime('2023-01-03')
frequency = pd.to_timedelta(1,'d')
i=0
while startdate <= enddate:
print(i)
i += 1
startdate += frequency
Output:
0
1
2
I understand that pd.to_datetime() returns a datetime object while pl.date() returns an expression of the type pl.Date, and the value error is clear enough. But I'm not sure how I would go about to create a, in a lack of better term, non-expression pl.Date "object". (Apologies for the poverty of my polars vocabulary.)
The reason why I want to do this in polars is that that's what I'm using for the rest of my code (where the while loop is inside a function that uses startdate and enddate to do a number of calculations as arguments to is_between()).
pl.Date
, pl.Datetime
, and pl.duration
are all Expressions. Expressions are meant to be used in a context (select
, with_columns
, and agg
are contexts.)
If you want a value then you have to put the expression in a context for resolution. You can do that but all of the contexts return a DataFrame from which you can extract a single value by then calling item()
like this...
startdate = pl.select(pl.date(year=2023, month=1, day=1)).item()
enddate = pl.select(pl.date(year=2023, month=1, day=10)).item()
frequency = pl.select(pl.duration(days=1)).item()
Now all of those variables have the same value as if you had simply used the standard datetime
library like this:
from datetime import datetime, date, timedelta
startdate = date(year=2023, month=1, day=1)
enddate = date(year=2023, month=1, day=10)
frequency = timedelta(days=1)