pythonpython-polarsrust-polarspolarssl

How to calculate correlations without demean for groups using python polars?


I am using Python polars and confused about writing custom functions.

I can calculate peason corelation by groups using below code.

df.group_by(["UNIVERSE", "datetime"]).agg(corr_xy = pl.corr("ratio_wsm", "y1d_nn_r"))

However, I want to use a custom correltion function without demean to replace pl.corr.

The calculation logic is as below shows.

def rcor(x, y, w=None) -> float:
    if w is not None:
        sxx = np.sum(w*x*x)
        syy = np.sum(w*y*y)
        sxy = np.sum(w*x*y)
        _rcor = sxy / np.sqrt(sxx * syy)
    else:
        sxx = np.sum(x*x)
        syy = np.sum(y*y)
        sxy = np.sum(x*y)
        _rcor = sxy / np.sqrt(sxx * syy)
    return _rcor

How can I realize this using Python polars. I am really confused about map_batches, map_elements, polars.api.register_expr_namespace.


Solution

  • You just need to rewrite your function a bit to use Polars expressions:

    def rcor(x, y, w=None):
        if w is not None:
            sxx = (w*x*x).sum()
            syy = (w*y*y).sum()
            sxy = (w*x*y).sum()
            r = sxy / (sxx * syy).sqrt()
        else:
            sxx = (x*x).sum()
            syy = (y*y).sum()
            sxy = (x*y).sum()
            r = sxy / (sxx * syy).sqrt()
        return _rcor
    

    Then it's simple to use:

    out = (
        df.group_by(["UNIVERSE", "datetime"])
          .agg(corr_xy = rcor(pl.col.ratio_wsm, pl.col.y1d_nn_r))
    )