dataframesortingpython-polars

Sort a Polars data frame with multiple category columns


I would like to do a multi-column sort on a polars data frame with two category columns but do not get the expected results. The final sorted data frame should be sorted by the first column, followed by sorting the second column, and finally sorting the third column.

The first two columns are categorical columns with the category order I want to maintain (i.e. the physical order). When I run the code below I would expect the sorted data frame to have the same orders for first two columns and the third column would have its value order switched.

I'm not sure what I'm missing and hoping someone can help me.

Here is code:

import polars as pl

df = pl.DataFrame(
    [
        ["ss", "lr", 2.1],
        ["ss", "lr", 1.1],
        ["ss", "hr", 2.1],
        ["ss", "hr", 1.1],
        ["ff", "lr", 2.1],
        ["ff", "lr", 1.1],
        ["ff", "hr", 2.1],
        ["ff", "hr", 1.1],
    ],
    schema=["proc", "res", "vdd"],
    orient="row",
)

with pl.StringCache():

    df = (
        df
        .with_columns(pl.col("proc").cast(pl.Categorical))
        .with_columns(pl.col("res").cast(pl.Categorical))
    )

    df1 = df.sort(df.columns)

print(df)
print(df1)

The output below shows that in the sorted data frame the second category column is not sorted in the physical order.

I'm using polars version 0.15.14 with python 3.10.8 on a Mac Book Pro with M1 Pro silicon.

shape: (8, 3)
┌──────┬─────┬─────┐
│ proc ┆ res ┆ vdd │
│ ---  ┆ --- ┆ --- │
│ cat  ┆ cat ┆ f64 │
╞══════╪═════╪═════╡
│ ss   ┆ lr  ┆ 2.1 │
│ ss   ┆ lr  ┆ 1.1 │
│ ss   ┆ hr  ┆ 2.1 │
│ ss   ┆ hr  ┆ 1.1 │
│ ff   ┆ lr  ┆ 2.1 │
│ ff   ┆ lr  ┆ 1.1 │
│ ff   ┆ hr  ┆ 2.1 │
│ ff   ┆ hr  ┆ 1.1 │
└──────┴─────┴─────┘
shape: (8, 3)
┌──────┬─────┬─────┐
│ proc ┆ res ┆ vdd │
│ ---  ┆ --- ┆ --- │
│ cat  ┆ cat ┆ f64 │
╞══════╪═════╪═════╡
│ ss   ┆ hr  ┆ 1.1 │
│ ss   ┆ hr  ┆ 2.1 │
│ ss   ┆ lr  ┆ 1.1 │
│ ss   ┆ lr  ┆ 2.1 │
│ ff   ┆ hr  ┆ 1.1 │
│ ff   ┆ hr  ┆ 2.1 │
│ ff   ┆ lr  ┆ 1.1 │
│ ff   ┆ lr  ┆ 2.1 │
└──────┴─────┴─────┘

Solution

  • This sorting issue was a bug as was pointed out. See https://github.com/pola-rs/polars/issues/7343. The latest polars version 0.16.12 fixed this issue.