I am using Python polars and confused about writing custom functions.
I can calculate peason corelation by groups using below code.
df.group_by(["UNIVERSE", "datetime"]).agg(corr_xy = pl.corr("ratio_wsm", "y1d_nn_r"))
However, I want to use a custom correltion function without demean
to replace pl.corr
.
The calculation logic is as below shows.
def rcor(x, y, w=None) -> float:
if w is not None:
sxx = np.sum(w*x*x)
syy = np.sum(w*y*y)
sxy = np.sum(w*x*y)
_rcor = sxy / np.sqrt(sxx * syy)
else:
sxx = np.sum(x*x)
syy = np.sum(y*y)
sxy = np.sum(x*y)
_rcor = sxy / np.sqrt(sxx * syy)
return _rcor
How can I realize this using Python polars. I am really confused about map_batches
, map_elements
, polars.api.register_expr_namespace
.
You just need to rewrite your function a bit to use Polars expressions:
def rcor(x, y, w=None):
if w is not None:
sxx = (w*x*x).sum()
syy = (w*y*y).sum()
sxy = (w*x*y).sum()
r = sxy / (sxx * syy).sqrt()
else:
sxx = (x*x).sum()
syy = (y*y).sum()
sxy = (x*y).sum()
r = sxy / (sxx * syy).sqrt()
return _rcor
Then it's simple to use:
out = (
df.group_by(["UNIVERSE", "datetime"])
.agg(corr_xy = rcor(pl.col.ratio_wsm, pl.col.y1d_nn_r))
)