Have a polars dataframe with the following data.
df = pl.DataFrame({"small": ["Apple Inc.", "Baidu Inc.", "Chevron Global", "Chevron Global", "Apple Inc."], "person": [10, 20, 30, 10, 10], "comp_big": ["Apple", "Baidu", "Chevron", "Chevron", "Apple"]})
Able to groupby on person, but this returns a dataframe of 2 lists.
(
df
.group_by("person")
.agg(
pl.col("small", "comp_big").unique()
)
)
Returns
This is close, but want to merge the result between small and comp_big into a single list.
(
df
.group_by("person")
.agg(
pl.col("small") + "|" + pl.col("comp_big")
)
)
This combines both into a single list, but I would want to split on the pipe and get the unique values.
How can this be achieved?
You can use pl.concat_list()
and then .flatten()
(
df
.group_by("person")
.agg(
pl.concat_list("small", "comp_big")
.flatten()
.unique()
)
)
shape: (3, 2)
┌────────┬───────────────────────────────────┐
│ person ┆ small │
│ --- ┆ --- │
│ i64 ┆ list[str] │
╞════════╪═══════════════════════════════════╡
│ 20 ┆ ["Baidu", "Baidu Inc."] │
│ 30 ┆ ["Chevron", "Chevron Global"] │
│ 10 ┆ ["Chevron", "Apple Inc.", … "Che… │
└────────┴───────────────────────────────────┘