rustrust-polarspolars

Rust-polars: unable to filter dataframe after renaming the column filtered


The following code runs:

fn main() {
    let mut df = df! [
        "names" => ["a", "b", "c", "d"],
        "values" => [1, 2, 3, 4],
        "floats" => [1.25, 2., 1., 0.5]
    ].unwrap();
    // println!("{:?}", df);

    let names_to_remove = Series::new("bad names".into(), ["c", "d"]);

    let df1 = df
        .clone()
        .lazy()
        .filter(col("names").is_in(lit(names_to_remove)).not())
        .collect()
        .unwrap();
    println!("{:?}", df1);
}

Now I try to do the same thing, but the name of the filtered column is changed:

fn main() {
    let mut df = df! [
        "names" => ["a", "b", "c", "d"],
        "values" => [1, 2, 3, 4],
        "floats" => [1.25, 2., 1., 0.5]
    ].unwrap();
    // println!("{:?}", df);

    let old_name = &df.get_column_names_owned()[0]; // rename first column
    let _ = df.rename(old_name, "all_names".into());
    // println!("{:?}", df);
    // println!("{:?}", df.column("all_names").unwrap());

    let cols_to_remove = Series::new("bad names".into(), ["c", "d"]);

    let df1 = df
        .clone()
        .lazy()
        .filter(col("all_names").is_in(lit(cols_to_remove)).not())
        .collect()
        .unwrap();
    println!("{:?}", df1);
}

This results in an error message:

thread 'main' panicked at src/main.rs:42:10:
called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("unable to find column \"all_names\"; valid columns: [\"names\", \"values\", \"floats\"]\n\nResolved plan until failure:\n\n\t---> FAILED HERE RESOLVING 'filter' <---\nDF [\"names\", \"values\", \"floats\"]; PROJECT */3 COLUMNS"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The message seems to suggest that the first column name hasn't changed, although printing the dataframe or even the first column extracted by its first name shows that the name has indeed changed.

However, using old column name doesn't work either (col("all_names") --> col("names")):

thread 'main' panicked at src/main.rs:43:10:
called `Result::unwrap()` on an `Err` value: ColumnNotFound(ErrString("\"names\" not found"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Solution

  • This is a bug. The same code appears to be working fine with version 0.45.1, but breaking in 0.46. I don't see anything documented under breaking changes in the release page. I would suggest that you raise an issue in the Github Repository.

    A workaround solution would be to run rename on LazyFrame .

    let old_name = &df.get_column_names_owned()[0];
    let cols_to_remove = Series::new("bad names".into(), ["c", "d"]);
    
    let df1 = df
            .clone()
            .lazy()
            .rename([old_name], ["all_names"], true)
            .filter(col("all_names").is_in(lit(cols_to_remove)).not())
            .collect().unwrap();
    println!("{:?}", df1);