I'm struggling to change the custom ordering of Polars Dataframe column after it has been created.
If I first create a Dataframe and set custom ordering for it the output is as expected:
df = pl.DataFrame(
{"cats": ["z", "z", "k", "a", "b"], "vals": [3, 1, 2, 2, 3]}
)
myorder=["k", "z", "b", "a"]
with pl.StringCache():
pl.Series(myorder).cast(pl.Categorical)
df=df.with_columns(pl.col('cats').cast(pl.Categorical))
print(df.sort(["cats"]))
shape: (5, 2)
┌──────┬──────┐
│ cats ┆ vals │
│ --- ┆ --- │
│ cat ┆ i64 │
╞══════╪══════╡
│ k ┆ 2 │
│ z ┆ 3 │
│ z ┆ 1 │
│ b ┆ 3 │
│ a ┆ 2 │
└──────┴──────┘
However, if I want to change that custom ordering and I run the same logic again:
myorder=["b", "z", "k", "a"]
with pl.StringCache():
pl.Series(myorder).cast(pl.Categorical)
df=df.with_columns(pl.col('cats').cast(pl.Categorical))
print(df.sort(["cats"]))
shape: (5, 2)
┌──────┬──────┐
│ cats ┆ vals │
│ --- ┆ --- │
│ cat ┆ i64 │
╞══════╪══════╡
│ k ┆ 2 │
│ z ┆ 3 │
│ z ┆ 1 │
│ b ┆ 3 │
│ a ┆ 2 │
└──────┴──────┘
...the custom ordering does not change. I believe this is because the cached categorical ordering does not change.
So, the question would be how can I change the custom ordering of Categorical column after it has been created?
Casting the column to pl.Utf8 resets the index and the Categorical column is updated after casting it back.
myorder=["b", "z", "k", "a"]
with pl.StringCache():
pl.Series(myorder).cast(pl.Categorical)
df=df.with_columns(pl.col('cats').cast(pl.Utf8).cast(pl.Categorical))