listpython-itertoolspython-polars

How would I generate combinations of items within Polars using the native expression API?


Is there a way to generate combinations of items within a list inside a Polars column without resorting to .map_elements() + itertools for each row?

This is my current solution:

import polars as pl
import itertools

(pl.DataFrame({'col': [['a', 'b', 'c']]})
   .with_columns(pl.col('col')
                   .map_elements(lambda list_o_things: [sorted((thing_1, thing_2))
                                                        for thing_1, thing_2 
                                                        in itertools.combinations(list_o_things, 2)])
                )
)

which returns this:

[['a', 'b'], ['a', 'c'], ['b', 'c']]


Solution

  • Explode the nested structure, do a cross join with itself, filter out the redundant entries, concat to list, and implode to nested list.

    df=pl.DataFrame({'col': [['a', 'b', 'c']]})
    (
        df
        .explode('col')
        .join(
            df.explode('col'), how='cross')
        .filter(pl.col('col')<pl.col('col_right'))
        .select(pl.concat_list('col','col_right').implode())
        )
    shape: (1, 1)
    ┌──────────────────────────────────────┐
    │ col                                  │
    │ ---                                  │
    │ list[list[str]]                      │
    ╞══════════════════════════════════════╡
    │ [["a", "b"], ["a", "c"], ["b", "c"]] │
    └──────────────────────────────────────┘