There is a conundrum I cannot solve in Polars:
This behaves as expected:
df = pl.DataFrame(
{
"int1": [1, 2, 3],
"int2": [3, 2, 1]
}
)
df.with_columns(
pl.struct('int1', 'int2')
.map_batches(lambda x: x.struct.field('int1') + x.struct.field('int2')).alias('int3')
)
output:
shape: (3, 3)
┌──────┬──────┬──────┐
│ int1 ┆ int2 ┆ int3 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪══════╡
│ 1 ┆ 3 ┆ 4 │
│ 2 ┆ 2 ┆ 4 │
│ 3 ┆ 1 ┆ 4 │
└──────┴──────┴──────┘
Yet this does not:
df = pl.DataFrame(
{
"int1": [[1], [2], [3]],
"int2": [[3], [2], [1]]
}
)
df.with_columns(
pl.struct('int1', 'int2')
.map_batches(lambda x: x.struct.field('int1').to_list() + x.struct.field('int2').to_list()).alias('int3')
)
output:
# InvalidOperationError: Series int3, length 1 doesn't match the DataFrame height of 3
This is the output I was expecting:
┌───────────┬───────────┬───────────┐
│ int1 ┆ int2 ┆ int3 │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╪═══════════╡
│ [1] ┆ [3] ┆ [1, 3] │
│ [2] ┆ [2] ┆ [2, 2] │
│ [3] ┆ [1] ┆ [3, 1] │
└───────────┴───────────┴───────────┘
So I think your problem is that you confuse .map_batches
and .map_elements
.
The differences of these two functions are nicely explained in the user guide see here
So to get your wanted result you have to do this:
print(
pl.DataFrame({"int1": [[1], [2], [3]], "int2": [[3], [2], [1]]}).with_columns(
pl.struct("int1", "int2").map_elements(lambda x: x["int1"] + x["int2"]).alias("int3")
)
)
shape: (3, 3)
┌───────────┬───────────┬───────────┐
│ int1 ┆ int2 ┆ int3 │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╪═══════════╡
│ [1] ┆ [3] ┆ [1, 3] │
│ [2] ┆ [2] ┆ [2, 2] │
│ [3] ┆ [1] ┆ [3, 1] │
└───────────┴───────────┴───────────┘