dataframeruststatisticsmeanrust-polars

How to append a row with column-wise means in a Polars DataFrame in Rust?


I'm trying to calculate the mean of each numeric column in a Polars v0.46.0 DataFrame using Rust, and append the result as a new row at the bottom of the DataFrame.

Here's a simplified example of the DataFrame I'm working with:

let df: DataFrame = df! [
    "a" => &["a", "b", "c"],
    "b" => &[1, 2, 3],
    "c" => &["d", "e", "f"],
    "d" => &[4, 5, 6],
].unwrap();

The expected final result would be:

┌─────┬─────┬─────┬─────┐
│ a   │ b   │ c   │ d   |
│ --- │ --- │ --- | --- |
│ Str │ f64 │ Str | f64 |
├─────┼─────┼─────┼─────┤
│ a   │ 1.0 │ d   │ 4.0 │
│ b   │ 2.0 │ e   │ 5.0 │
│ c   │ 3.0 │ f   │ 6.0 │
│ avg │ 2.0 │ null│ 5.0 │
└─────┴─────┴─────┴─────┘

To calculate the average, in an older version of my project, which in turn used an older version of Polars, I was able to get by with column.mean_as_series(), but now it seems to be no longer available. I then tried with column.mean(), but I get the following error (Helix w/ rust-analyzer):

the method `mean` exists for reference `&Column`, but its trait bounds were not
satisfied
the following trait bounds were not satisfied:
`Column: Ord`
which is required by `&Column: Ord`
`&Column: Ord`
which is required by `&&Column: Ord`
`&Column: Ord`
which is required by `&mut &Column: Ord`
`&Column: Iterator`
which is required by `&mut &Column: Iterator`
`Column: Ord`
which is required by `&mut Column: Ord`
`Column: Iterator`
which is required by `&mut Column: Iterator`

What is the cleanest way to do this in the above version of Polars?


Solution

  • In terms of the Python API, it looks like you're doing a concat with the diagonal_relaxed strategy?

    You can enable it along with the Lazy DSL in your Cargo.toml

    features = ["lazy", "diagonal_concat"]
    
    use polars::prelude::*;
    
    fn main() -> PolarsResult<()> {
        let df: DataFrame = df!(
            "a" => &["a", "b", "c"],
            "b" => &[1, 2, 3],
            "c" => &["d", "e", "f"],
            "d" => &[4, 5, 6],
        )?;
    
        let result = concat_lf_diagonal(
            [
                df.clone().lazy(),
                df.clone()
                    .lazy()
                    .select([lit("avg").alias("a"), cols(["b", "d"]).mean()]),
            ],
            UnionArgs {
                to_supertypes: true,
                ..Default::default()
            },
        )?
        .collect()?;
    
        dbg!(result);
    
        Ok(())
    }
    
    [src/main.rs:25:5] result = shape: (4, 4)
    ┌─────┬─────┬──────┬─────┐
    │ a   ┆ b   ┆ c    ┆ d   │
    │ --- ┆ --- ┆ ---  ┆ --- │
    │ str ┆ f64 ┆ str  ┆ f64 │
    ╞═════╪═════╪══════╪═════╡
    │ a   ┆ 1.0 ┆ d    ┆ 4.0 │
    │ b   ┆ 2.0 ┆ e    ┆ 5.0 │
    │ c   ┆ 3.0 ┆ f    ┆ 6.0 │
    │ avg ┆ 2.0 ┆ null ┆ 5.0 │
    └─────┴─────┴──────┴─────┘