[SOLVED] How to Calculate Z-Scores for a List of Values in Polars DataFrame

How to Calculate Z-Scores for a List of Values in Polars DataFrame

I'm working with a Polars DataFrame in Python, where I have a column containing lists of values. I need to calculate the Z-scores for each list using pre-computed mean and standard deviation values. Here’s a sample of my DataFrame:

import polars as pl

data = {
    "transcript_id": ["ENST00000711184.1"],
    "OE": [[3.933402, 1.057907, None, 3.116513]],
    "mean_OE": [11.882091],
    "std_OE": [3.889974],
}

df_human = pl.DataFrame(data)

For each list in the OE column, I want to subtract the mean (mean_OE) and divide by the standard deviation (std_OE) to obtain the Z-scores. I also want to handle None values in the lists by leaving them as None in the Z-scores list.

How can I correctly apply the Z-score calculation to each list while keeping None values intact?

Thanks in advance for any guidance!

Solution

Since the last couple releases, and especially since Polars release 1.10.0, arithmetic between list columns and non-list columns simplified a lot.

If you are interested in the usual definition of the Z-score (using the summary statistics of the actual list data), the following can be used.

df_human.select(
    (pl.col("OE") - pl.col("OE").list.mean()) / pl.col("OE").list.std()
)

shape: (1, 1)
┌─────────────────────────────────────┐
│ OE                                  │
│ ---                                 │
│ list[f64]                           │
╞═════════════════════════════════════╡
│ [1.230795, -1.6447, null, 0.413906] │
└─────────────────────────────────────┘

If you want to compute the Z-score explicitly using the mean_OE and std_OE columns, you can now use them directly.

df_human.select((pl.col("OE") - pl.col("mean_OE")) / pl.col("std_OE"))

shape: (1, 1)
┌─────────────────────────────────────────┐
│ OE                                      │
│ ---                                     │
│ list[f64]                               │
╞═════════════════════════════════════════╡
│ [-2.043378, -2.782585, null, -2.253377] │
└─────────────────────────────────────────┘