I observed that the polars expression:
pl.DataFrame(data={}).select(a=pl.lit(None) | pl.lit(True))
evaluates to True, but it should evaluate to None in my estimation, based on the concept of "null aware evaluation".
This concept ensures that if any part of an expression evaluates to null, the overall result is also null. This is particularly relevant in expressions involving multiple operations, where the presence of a null value can affect the final outcome.
In contrast:
pl.DataFrame(data={}).select(a=pl.lit(None) & pl.lit(True))
does indeed evaluate to None, rather than False. And so do all of the expressions:
pl.DataFrame(data={}).select(a=pl.lit(None) > pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) < pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) == pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) + pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) - pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) * pl.lit(2))
pl.DataFrame(data={}).select(a=pl.lit(None) / pl.lit(2))
What is going on here?
Polars uses Kleene logic to deal with nulls. This can be seen in the documentation by checking the documentation of various expressions corresponding to boolean operations:
To better understand the logic, it can make sense to think of None
values as unknown (or missing) values.
For example, any assignment of the unknown value (None
) in pl.lit(None) | pl.lit(True)
will make the expression True
. Hence, it evaluates to True
.
In contrast, there are some assignments (True
/ False
) of the unknown value (None
) in the expression pl.lit(None) & pl.lit(True)
, making the expression True
or False
. Hence, the expression evaluates to an unknown value (None
).
Similar arguments can be made for the arithmetic expressions provided in the question.