
How to calculate horizontal median

How to calculate horizontal median for numerical columns?

df = pl.DataFrame({"ABC":["foo", "bar", "foo"], "A":[1,2,3], "B":[2,1,None], "C":[1,2,3]})

shape: (3, 4)
│ ABC ┆ A   ┆ B    ┆ C   │
│ --- ┆ --- ┆ ---  ┆ --- │
│ str ┆ i64 ┆ i64  ┆ i64 │
│ foo ┆ 1   ┆ 2    ┆ 1   │
│ bar ┆ 2   ┆ 1    ┆ 2   │
│ foo ┆ 3   ┆ null ┆ 3   │

I want to achieve the same as with the below pl.mean_horizontal, but get median instead of the mean. I did not find existing expression for this.

print(df.with_columns(pl.mean_horizontal(pl.col(pl.Int64)).alias("Horizontal Mean")))

shape: (3, 5)
│ ABC ┆ A   ┆ B    ┆ C   ┆ Horizontal Mean │
│ --- ┆ --- ┆ ---  ┆ --- ┆ ---             │
│ str ┆ i64 ┆ i64  ┆ i64 ┆ f64             │
│ foo ┆ 1   ┆ 2    ┆ 1   ┆ 1.333333        │
│ bar ┆ 2   ┆ 1    ┆ 2   ┆ 1.666667        │
│ foo ┆ 3   ┆ null ┆ 3   ┆ 3.0             │


  • There's no median_horizontal() at the moment, but you could use

        pl.concat_list(pl.col(pl.Int64)).list.median().alias("Horizontal Median")
    shape: (3, 5)
    │ ABC ┆ A   ┆ B    ┆ C   ┆ Horizontal Median │
    │ --- ┆ --- ┆ ---  ┆ --- ┆ ---               │
    │ str ┆ i64 ┆ i64  ┆ i64 ┆ f64               │
    │ foo ┆ 1   ┆ 2    ┆ 1   ┆ 1.0               │
    │ bar ┆ 2   ┆ 1    ┆ 2   ┆ 2.0               │
    │ foo ┆ 3   ┆ null ┆ 3   ┆ 3.0               │

    Or you can use numpy integration (but this will probably be slower):

    import numpy as np
        pl.Series("Horizontal Median", np.nanmedian(, axis=1))
    shape: (3, 5)
    │ ABC ┆ A   ┆ B    ┆ C   ┆ Horizontal Median │
    │ --- ┆ --- ┆ ---  ┆ --- ┆ ---               │
    │ str ┆ i64 ┆ i64  ┆ i64 ┆ f64               │
    │ foo ┆ 1   ┆ 2    ┆ 1   ┆ 1.0               │
    │ bar ┆ 2   ┆ 1    ┆ 2   ┆ 2.0               │
    │ foo ┆ 3   ┆ null ┆ 3   ┆ 3.0               │