How can I obtain the (first occurence) indices of unique elements for a column of type list in polars dataframe? I am looking for something similar to arg_unique
, but that only exists for pl.Series
, such as to be performed over a whole column. I need this to work one level below that, so on every list that is inside the column.
Given the dataframe
df = pl.DataFrame({
"fruits": [["apple", "banana", "apple", "orange"], ["grape", "apple", "grape"], ["kiwi", "mango", "kiwi"]]
})
I expect the output to be
df = pl.DataFrame({
"fruits": [[0, 1, 3], [0, 1], [0, 1]]
})
.list.eval()
can be used as a fallback when there is no specific .list.*
method currently implemented.
df.with_columns(
pl.col("fruits").list.eval(pl.element().arg_unique()).alias("idxs")
)
shape: (3, 2)
┌────────────────────────────────────────┬───────────┐
│ fruits ┆ idxs │
│ --- ┆ --- │
│ list[str] ┆ list[u32] │
╞════════════════════════════════════════╪═══════════╡
│ ["apple", "banana", "apple", "orange"] ┆ [0, 1, 3] │
│ ["grape", "apple", "grape"] ┆ [0, 1] │
│ ["kiwi", "mango", "kiwi"] ┆ [0, 1] │
└────────────────────────────────────────┴───────────┘