python-polarsscipy.stats

Using non-polars operations on polars column


I'd like to apply trim_mean from scipy.stats to some polars column.

I tried the following, but am unsure about why the second option fails with

IndexError: tuple index out of range**
from scipy.stats import trim_mean

df = pl.DataFrame({
    "x": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]
})

# compute regular mean
df.select(
    pl.col("x").mean().alias("mean")
)

# trim mean
df.select(
    trim_mean(pl.col("x"),0.05).alias("trim_mean")
)

What data type is a polars column in this case? Or is there some other method to compute this in polars?


Solution

  • For completeness, a running implementation of trim_mean using native polars expressions could look as follows.

    def trim_mean_pl(expr: pl.Expr, proportiontocut: float):
        m = (proportiontocut * pl.len()).floor().cast(pl.UInt32)
        return expr.sort().slice(m, pl.len()-2*m).mean()
    
    df.with_columns(
        trim_mean_pl(pl.col("X"), 0.05).alias("trim_mean_pl")
    )