pythonpython-polars

How to change a non-alphabetically ordered Categorical column ordering in Polars Dataframe?


I'm struggling to change the custom ordering of Polars Dataframe column after it has been created.

If I first create a Dataframe and set custom ordering for it the output is as expected:

df = pl.DataFrame(
{"cats": ["z", "z", "k", "a", "b"], "vals": [3, 1, 2, 2, 3]}
)

myorder=["k", "z", "b", "a"]

with pl.StringCache():
    pl.Series(myorder).cast(pl.Categorical)
    df=df.with_columns(pl.col('cats').cast(pl.Categorical))

print(df.sort(["cats"]))

shape: (5, 2)
┌──────┬──────┐
│ cats ┆ vals │
│ ---  ┆ ---  │
│ cat  ┆ i64  │
╞══════╪══════╡
│ k    ┆ 2    │
│ z    ┆ 3    │
│ z    ┆ 1    │
│ b    ┆ 3    │
│ a    ┆ 2    │
└──────┴──────┘

However, if I want to change that custom ordering and I run the same logic again:

myorder=["b", "z", "k", "a"]

with pl.StringCache():
    pl.Series(myorder).cast(pl.Categorical)
    df=df.with_columns(pl.col('cats').cast(pl.Categorical))

print(df.sort(["cats"]))


shape: (5, 2)
┌──────┬──────┐
│ cats ┆ vals │
│ ---  ┆ ---  │
│ cat  ┆ i64  │
╞══════╪══════╡
│ k    ┆ 2    │
│ z    ┆ 3    │
│ z    ┆ 1    │
│ b    ┆ 3    │
│ a    ┆ 2    │
└──────┴──────┘

...the custom ordering does not change. I believe this is because the cached categorical ordering does not change.

So, the question would be how can I change the custom ordering of Categorical column after it has been created?


Solution

  • Casting the column to pl.Utf8 resets the index and the Categorical column is updated after casting it back.

    myorder=["b", "z", "k", "a"]
    
    with pl.StringCache():
        pl.Series(myorder).cast(pl.Categorical)
        df=df.with_columns(pl.col('cats').cast(pl.Utf8).cast(pl.Categorical))