[SOLVED] Using non-polars operations on polars column

Using non-polars operations on polars column

I'd like to apply trim_mean from scipy.stats to some polars column.

I tried the following, but am unsure about why the second option fails with

IndexError: tuple index out of range**

from scipy.stats import trim_mean

df = pl.DataFrame({
    "x": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]
})

# compute regular mean
df.select(
    pl.col("x").mean().alias("mean")
)

# trim mean
df.select(
    trim_mean(pl.col("x"),0.05).alias("trim_mean")
)

What data type is a polars column in this case? Or is there some other method to compute this in polars?

Solution

For completeness, a running implementation of trim_mean using native polars expressions could look as follows.

def trim_mean_pl(expr: pl.Expr, proportiontocut: float):
    m = (proportiontocut * pl.len()).floor().cast(pl.UInt32)
    return expr.sort().slice(m, pl.len()-2*m).mean()

df.with_columns(
    trim_mean_pl(pl.col("X"), 0.05).alias("trim_mean_pl")
)