pythonpython-polars

How to zip 2 list columns on Python Polars?


I have a dataframe with columns A, B and C where B and C are list columns.

df = pl.DataFrame({
    'A': ['t', 'u', 'v'],
    'B': [['a', 'v', 'x'], ['f', 'g', 'h'], ['p', 'o', 'i']],
    'C': [[11, 12, 14], [41, 42, 43], [66, 77, 88]]
})

I need to combine then like follows:

Original:
┌─────┬─────────────────┬──────────────┐
│ A   ┆ B               ┆ C            │
│ --- ┆ ---             ┆ ---          │
│ i64 ┆ list[str]       ┆ list[i64]    │
╞═════╪═════════════════╪══════════════╡
│ t   ┆ ["a", "v", "x"] ┆ [11, 12, 14] │
│ u   ┆ ["f", "g", "h"] ┆ [41, 42, 43] │
│ v   ┆ ["p", "o", "i"] ┆ [66, 77, 88] │
└─────┴─────────────────┴──────────────┘

Final: 
┌─────┬─────────────────────────────────────┐
│ A   ┆ zip(B,C)                            │
│ --- ┆ ---                                 │
│ i64 ┆ object(?)                           │
╞═════╪═════════════════════════════════════╡
│ t   ┆ [('a', 11), ('v', 12), ('x', 14) ]  │
│ u   ┆ [('f', 41), ('g', 42), ('h', 43) ]  │
│ v   ┆ [('p', 66), ('o', 77), ('i', 88) ]  │
└─────┴─────────────────────────────────────┘

Using just Python I would do a zip(), but this approach does not scale. I thought about using explode() on the lists, casting then as string and join the results using a separator, but that does not feels right, and I would have problems to keep the data on column A correctly related to the exploded result.

Is there another way to achieve this result?


Solution

  • In Polars, you can use a struct for this.

    (
        df.explode("B", "C")
        .select("A", pl.struct("B", "C").alias("struct"))
        .group_by("A")
        .agg("struct")
    )
    
    shape: (3, 2)
    ┌─────┬────────────────────────────────┐
    │ A   ┆ struct                         │
    │ --- ┆ ---                            │
    │ str ┆ list[struct[2]]                │
    ╞═════╪════════════════════════════════╡
    │ t   ┆ [{"a",11}, {"v",12}, {"x",14}] │
    │ u   ┆ [{"f",41}, {"g",42}, {"h",43}] │
    │ v   ┆ [{"p",66}, {"o",77}, {"i",88}] │
    └─────┴────────────────────────────────┘
    

    The result is a list of struct.