In Python Polars, I am trying to extract the length of the lists inside a struct to re-use it in an expression.
For example, I have the code below:
import polars as pl
df = pl.DataFrame(
{
"x": [0, 4],
"y": [
{"low": [-1, 0, 1], "up": [1, 2, 3]},
{"low": [-2, -1, 0], "up": [0, 1, 2]},
],
}
)
df.with_columns(
check=pl.concat_list([pl.all_horizontal(
[
pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
]
) for i in range(3)]).list.max()
)
shape: (2, 3)
┌─────┬─────────────────────────┬───────┐
│ x ┆ y ┆ check │
│ --- ┆ --- ┆ --- │
│ i64 ┆ struct[2] ┆ bool │
╞═════╪═════════════════════════╪═══════╡
│ 0 ┆ {[-1, 0, 1],[1, 2, 3]} ┆ true │
│ 4 ┆ {[-2, -1, 0],[0, 1, 2]} ┆ false │
└─────┴─────────────────────────┴───────┘
and I would like to infer the length of the lists in advance (i.e. not having to hardcode the 3
), as it can change depending on the call.
The challenge I am facing, is that I need to include everything in the same expression context. I have tried as below, but it is not working as I cannot extract the value returned by one of the expressions:
df.with_columns(
check=pl.concat_list([pl.all_horizontal(
[
pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
]
) for i in range(pl.col("y").struct["low"].list.len())]).list.max()
)
Unfortunately, I don't see a way to use an expression for the list length here. Also, direct comparisons of list
columns are not yet natively supported.
Still, some on-the-fly exploding and imploding of the list columns could be used to achieve the desired result without relying on knowing the list lengths upfront.
(
df
.with_columns(
ge_low=(pl.col("x") >= pl.col("y").struct["low"].explode()).implode().over(pl.int_range(pl.len())),
le_up=(pl.col("x") <= pl.col("y").struct["up"].explode()).implode().over(pl.int_range(pl.len())),
)
.with_columns(
check=(pl.col("ge_low").explode() & pl.col("le_up").explode()).implode().over(pl.int_range(pl.len()))
)
)
shape: (2, 5)
┌─────┬─────────────────────────┬─────────────────────┬───────────────────────┬───────────────────────┐
│ x ┆ y ┆ ge_low ┆ le_up ┆ check │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ struct[2] ┆ list[bool] ┆ list[bool] ┆ list[bool] │
╞═════╪═════════════════════════╪═════════════════════╪═══════════════════════╪═══════════════════════╡
│ 0 ┆ {[-1, 0, 1],[1, 2, 3]} ┆ [true, true, false] ┆ [true, true, true] ┆ [true, true, false] │
│ 4 ┆ {[-2, -1, 0],[0, 1, 2]} ┆ [true, true, true] ┆ [false, false, false] ┆ [false, false, false] │
└─────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┘