pythondataframepython-polars

Polars map_batches on list type raises InvalidOperationError


There is a conundrum I cannot solve in Polars:

This behaves as expected:

df = pl.DataFrame(
    {
        "int1": [1, 2, 3],
        "int2": [3, 2, 1]
    }
)

df.with_columns(
    pl.struct('int1', 'int2')
      .map_batches(lambda x: x.struct.field('int1') + x.struct.field('int2')).alias('int3')
)      

output:

shape: (3, 3)
┌──────┬──────┬──────┐
│ int1 ┆ int2 ┆ int3 │
│ ---  ┆ ---  ┆ ---  │
│ i64  ┆ i64  ┆ i64  │
╞══════╪══════╪══════╡
│ 1    ┆ 3    ┆ 4    │
│ 2    ┆ 2    ┆ 4    │
│ 3    ┆ 1    ┆ 4    │
└──────┴──────┴──────┘

Yet this does not:

df = pl.DataFrame(
    {
        "int1": [[1], [2], [3]],
        "int2": [[3], [2], [1]]
    }
)

df.with_columns(
    pl.struct('int1', 'int2')
      .map_batches(lambda x: x.struct.field('int1').to_list() + x.struct.field('int2').to_list()).alias('int3')
)

output:

# InvalidOperationError: Series int3, length 1 doesn't match the DataFrame height of 3

This is the output I was expecting:

┌───────────┬───────────┬───────────┐
│ int1      ┆ int2      ┆ int3      │
│ ---       ┆ ---       ┆ ---       │
│ list[i64] ┆ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╪═══════════╡
│ [1]       ┆ [3]       ┆ [1, 3]    │
│ [2]       ┆ [2]       ┆ [2, 2]    │
│ [3]       ┆ [1]       ┆ [3, 1]    │
└───────────┴───────────┴───────────┘

Solution

  • So I think your problem is that you confuse .map_batches and .map_elements.

    The differences of these two functions are nicely explained in the user guide see here

    So to get your wanted result you have to do this:

    print(
        pl.DataFrame({"int1": [[1], [2], [3]], "int2": [[3], [2], [1]]}).with_columns(
            pl.struct("int1", "int2").map_elements(lambda x: x["int1"] + x["int2"]).alias("int3")
        )
    )
    
    shape: (3, 3)
    ┌───────────┬───────────┬───────────┐
    │ int1      ┆ int2      ┆ int3      │
    │ ---       ┆ ---       ┆ ---       │
    │ list[i64] ┆ list[i64] ┆ list[i64] │
    ╞═══════════╪═══════════╪═══════════╡
    │ [1]       ┆ [3]       ┆ [1, 3]    │
    │ [2]       ┆ [2]       ┆ [2, 2]    │
    │ [3]       ┆ [1]       ┆ [3, 1]    │
    └───────────┴───────────┴───────────┘