Suppose I have a bag of items {a, b}
. Then I can choose pairs out of it in a variety of ways. One way might be to pick all possible permutations: [a, a], [a, b], [b, a], [b, b]
. But I might disallow repetition, in which case the possible permutations are: [a, b], [b, a]
. I might go further and declare that [a, b]
is the same as [b, a]
, i.e. I only care about the "combination" of choices, not their permutations.
For more about the distinction between combination vs. permutation, see: https://en.wikipedia.org/wiki/Combination
What are the best ways to produce a combination of choices (i.e. order of elements should not matter)? My current solutions looks like this:
import polars as pl
choices = pl.DataFrame(
[
pl.Series("flavor", ["x"] * 2 + ["y"] * 3),
pl.Series("choice", ["a", "b"] + ["1", "2", "3"]),
]
)
# join to produce the choices
choices.join(choices, on=["flavor"]).with_columns(
# generate a 2-element list representing the choice
sorted_choice_pair=pl.concat_list("choice", "choice_right").list.sort()
).filter(pl.col.choice.eq(pl.col.sorted_choice_pair.list.first()))
shape: (9, 4)
┌────────┬────────┬──────────────┬────────────────────┐
│ flavor ┆ choice ┆ choice_right ┆ sorted_choice_pair │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ str ┆ list[str] │
╞════════╪════════╪══════════════╪════════════════════╡
│ x ┆ a ┆ a ┆ ["a", "a"] │
│ x ┆ a ┆ b ┆ ["a", "b"] │
│ x ┆ b ┆ b ┆ ["b", "b"] │
│ y ┆ 1 ┆ 1 ┆ ["1", "1"] │
│ y ┆ 1 ┆ 2 ┆ ["1", "2"] │
│ y ┆ 2 ┆ 2 ┆ ["2", "2"] │
│ y ┆ 1 ┆ 3 ┆ ["1", "3"] │
│ y ┆ 2 ┆ 3 ┆ ["2", "3"] │
│ y ┆ 3 ┆ 3 ┆ ["3", "3"] │
└────────┴────────┴──────────────┴────────────────────┘
So I generate all permutations, and then filter out those that where the "left element" does not match the first element of the list.
You can use .join_where()
with a row index predicate to prevent "duplicates".
(choices
.with_row_index()
.join_where(choices.with_row_index(),
pl.col.flavor == pl.col.flavor_right,
pl.col.index <= pl.col.index_right
)
)
shape: (9, 5)
┌───────┬────────┬────────┬─────────────┬──────────────┐
│ index ┆ flavor ┆ choice ┆ index_right ┆ choice_right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ str ┆ str ┆ u32 ┆ str │
╞═══════╪════════╪════════╪═════════════╪══════════════╡
│ 0 ┆ x ┆ a ┆ 0 ┆ a │
│ 0 ┆ x ┆ a ┆ 1 ┆ b │
│ 1 ┆ x ┆ b ┆ 1 ┆ b │
│ 2 ┆ y ┆ 1 ┆ 2 ┆ 1 │
│ 2 ┆ y ┆ 1 ┆ 3 ┆ 2 │
│ 3 ┆ y ┆ 2 ┆ 3 ┆ 2 │
│ 2 ┆ y ┆ 1 ┆ 4 ┆ 3 │
│ 3 ┆ y ┆ 2 ┆ 4 ┆ 3 │
│ 4 ┆ y ┆ 3 ┆ 4 ┆ 3 │
└───────┴────────┴────────┴─────────────┴──────────────┘