pythonpython-polars

polars: n_unique(), but as a window function


I need a way to find out how many unique pairs of values from two columns are in a certain context. Basically like n_unique, but as a window function.

To illustrate with a toy example:

import polars as pl

dataframe = pl.DataFrame({
    'context': [1, 1, 1,  2, 2, 2,  3, 3, 3],
    'column1': [1, 1, 0,  1, 0, 0,  1, 0, 1],
    'column2': [1, 0, 0,  0, 1, 1,  1, 0, 1]
    # unique:   1  2  3   1  2  -   1  2  -
    # n_unique: -- 3 --   -- 2 --   -- 2 --
})

I would like to write:

dataframe = (
    dataframe
    .with_columns(
        pl.n_unique('column1', 'column2').over('context').alias('n_unique')
    )
)

to get the number of unique value pairs from column1, column2 within the window of column 'context'. But that does not work.

One attempt I made was this:

(dataframe
    .with_columns(
        pl.concat_list('column1', 'column2').alias('pair')
    )
    .with_columns(
        pl.n_unique('pair').over('context')
    )
)

Which works, but is there a better way?


Solution

  • All expressions are this functional construct Fn(Series) -> Series. Meaning that if you want to compute something over multiple columns, you must ensure that there are multiple columns in the input Series.

    We can easily do this by packing them into a Struct data type.

    import polars as pl
    
    df = pl.DataFrame({
        'context': [1, 1, 1,  2, 2, 2,  3, 3, 3],
        'column1': [1, 1, 0,  1, 0, 0,  1, 0, 1],
        'column2': [1, 0, 0,  0, 1, 1,  1, 0, 1]
    })
    
    df.with_columns(
        pl.struct("column1", "column2").n_unique().over("context").alias("n_unique")
    )
    
    shape: (9, 4)
    ┌─────────┬─────────┬─────────┬──────────┐
    │ context ┆ column1 ┆ column2 ┆ n_unique │
    │ ---     ┆ ---     ┆ ---     ┆ ---      │
    │ i64     ┆ i64     ┆ i64     ┆ u32      │
    ╞═════════╪═════════╪═════════╪══════════╡
    │ 1       ┆ 1       ┆ 1       ┆ 3        │
    │ 1       ┆ 1       ┆ 0       ┆ 3        │
    │ 1       ┆ 0       ┆ 0       ┆ 3        │
    │ 2       ┆ 1       ┆ 0       ┆ 2        │
    │ 2       ┆ 0       ┆ 1       ┆ 2        │
    │ 2       ┆ 0       ┆ 1       ┆ 2        │
    │ 3       ┆ 1       ┆ 1       ┆ 2        │
    │ 3       ┆ 0       ┆ 0       ┆ 2        │
    │ 3       ┆ 1       ┆ 1       ┆ 2        │
    └─────────┴─────────┴─────────┴──────────┘