I am trying to apply value_counts()
to multiple columns, but getting an error.
df = pl.from_repr("""
┌──────────────┬─────────────┐
│ sub-category ┆ category │
│ --- ┆ --- │
│ str ┆ str │
╞══════════════╪═════════════╡
│ tv ┆ electronics │
│ mobile ┆ mobile │
│ tv ┆ electronics │
│ wm ┆ electronics │
│ micro ┆ kitchen │
│ wm ┆ electronics │
└──────────────┴─────────────┘
""")
If I convert it to Pandas
, I can use apply
:
pl.from_pandas(
df.to_pandas().apply(lambda x: x.value_counts()).reset_index()
)
shape: (6, 3)
┌─────────────┬──────────────┬──────────┐
│ index ┆ sub-category ┆ category │
│ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 │
╞═════════════╪══════════════╪══════════╡
│ electronics ┆ null ┆ 4.0 │
│ kitchen ┆ null ┆ 1.0 │
│ micro ┆ 1.0 ┆ null │
│ mobile ┆ 1.0 ┆ 1.0 │
│ tv ┆ 2.0 ┆ null │
│ wm ┆ 2.0 ┆ null │
└─────────────┴──────────────┴──────────┘
How do I get the same result in Polars?
.value_counts()
is implemented as .group_by().len()
Generally, it's easier to just group_by manually.
If you first reshape with .unpivot()
shape: (12, 2)
┌──────────────┬─────────────┐
│ variable ┆ value │
│ --- ┆ --- │
│ str ┆ str │
╞══════════════╪═════════════╡
│ sub-category ┆ tv │
│ sub-category ┆ mobile │
│ sub-category ┆ tv │
│ sub-category ┆ wm │
│ sub-category ┆ micro │
│ … ┆ … │
│ category ┆ mobile │
│ category ┆ electronics │
│ category ┆ electronics │
│ category ┆ kitchen │
│ category ┆ electronics │
└──────────────┴─────────────┘
Then len of each group is the count.
df.unpivot().group_by(pl.all()).len()
shape: (7, 3)
┌──────────────┬─────────────┬─────┐
│ variable ┆ value ┆ len │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ u32 │
╞══════════════╪═════════════╪═════╡
│ category ┆ kitchen ┆ 1 │
│ sub-category ┆ tv ┆ 2 │
│ sub-category ┆ mobile ┆ 1 │
│ category ┆ mobile ┆ 1 │
│ sub-category ┆ wm ┆ 2 │
│ sub-category ┆ micro ┆ 1 │
│ category ┆ electronics ┆ 4 │
└──────────────┴─────────────┴─────┘
.pivot()
can be used if the "wide" shape is needed.
(df.unpivot()
.pivot(
on = "variable",
index = "value",
values = "value",
aggregate_function = pl.len()
)
)
shape: (6, 3)
┌─────────────┬──────────────┬──────────┐
│ value ┆ sub-category ┆ category │
│ --- ┆ --- ┆ --- │
│ str ┆ u32 ┆ u32 │
╞═════════════╪══════════════╪══════════╡
│ tv ┆ 2 ┆ null │
│ mobile ┆ 1 ┆ 1 │
│ wm ┆ 2 ┆ null │
│ micro ┆ 1 ┆ null │
│ electronics ┆ null ┆ 4 │
│ kitchen ┆ null ┆ 1 │
└─────────────┴──────────────┴──────────┘