I'd like to apply trim_mean
from scipy.stats
to some polars column.
I tried the following, but am unsure about why the second option fails with
IndexError: tuple index out of range**
from scipy.stats import trim_mean
df = pl.DataFrame({
"x": [1, 2, 3, 4, 6, 8, 5, 9, 12, 15, 4, 6]
})
# compute regular mean
df.select(
pl.col("x").mean().alias("mean")
)
# trim mean
df.select(
trim_mean(pl.col("x"),0.05).alias("trim_mean")
)
What data type is a polars column in this case? Or is there some other method to compute this in polars?
For completeness, a running implementation of trim_mean
using native polars expressions could look as follows.
def trim_mean_pl(expr: pl.Expr, proportiontocut: float):
m = (proportiontocut * pl.len()).floor().cast(pl.UInt32)
return expr.sort().slice(m, pl.len()-2*m).mean()
df.with_columns(
trim_mean_pl(pl.col("X"), 0.05).alias("trim_mean_pl")
)