expressionpython-polarstruthiness

How to return a date/datetime object rather than an expression of that type?


I want to declare a start and end date variable (date/datetime dtype, preferably using Polars), and then to run a while loop where the condition is that startdate <= enddate (each loop raising the value of startdate by one day).

Here's my attempt:

import polars as pl

startdate = pl.date(year=2023, month=1, day=1)
enddate = pl.date(year=2023, month=1, day=10)
frequency = pl.duration(days=1)

i=0
while startdate <= enddate:
    print(i)
    i += 1
    startdate += frequency

Output:

ValueError: Since Expr are lazy, the truthiness of an Expr is ambiguous. Hint: use '&' or '|' to logically combine Expr, not 'and'/'or', and use 'x.is_in([y,z])' instead of 'x in [y,z]' to check membership.

... and here's how I would do this in Pandas:

import pandas as pd

startdate = pd.to_datetime('2023-01-01')
enddate = pd.to_datetime('2023-01-03')
frequency = pd.to_timedelta(1,'d')

i=0
while startdate <= enddate:
    print(i)
    i += 1
    startdate += frequency

Output:

0
1
2

I understand that pd.to_datetime() returns a datetime object while pl.date() returns an expression of the type pl.Date, and the value error is clear enough. But I'm not sure how I would go about to create a, in a lack of better term, non-expression pl.Date "object". (Apologies for the poverty of my polars vocabulary.)

The reason why I want to do this in polars is that that's what I'm using for the rest of my code (where the while loop is inside a function that uses startdate and enddate to do a number of calculations as arguments to is_between()).


Solution

  • pl.Date, pl.Datetime, and pl.duration are all Expressions. Expressions are meant to be used in a context (select, with_columns, and agg are contexts.)

    If you want a value then you have to put the expression in a context for resolution. You can do that but all of the contexts return a DataFrame from which you can extract a single value by then calling item() like this...

    startdate = pl.select(pl.date(year=2023, month=1, day=1)).item()
    enddate = pl.select(pl.date(year=2023, month=1, day=10)).item()
    frequency = pl.select(pl.duration(days=1)).item()
    

    Now all of those variables have the same value as if you had simply used the standard datetime library like this:

    from datetime import datetime, date, timedelta
    
    startdate = date(year=2023, month=1, day=1)
    enddate = date(year=2023, month=1, day=10)
    frequency = timedelta(days=1)