rustrust-polarspolars

How to use the "is_in" function correctly?


In Polars 0.46.0 it works normally:

let df = df!(
    "id" => [0, 1, 2, 3, 4],
    "col_1" => [1, 2, 3, 4, 5],
    "col_2" => [3, 4, 5, 6, 7],
)
.unwrap();
dbg!(&df);

let s = df.column("col_2").unwrap().as_materialized_series();

let combo = df
    .clone()
    .lazy()
    .filter(col("id").is_in(lit(s.clone()), false))
    .collect()
    .unwrap();
dbg!(&combo);

The same code in Polars 0.50.0 is deprecated:

Deprecation: is_in with a collection of the same datatype is ambiguous and deprecated. Please use implode to return to previous behavior. See https://github.com/pola-rs/polars/issues/22149 for more information.

How should I write it in Polars 0.50.0, to not get a deprecation warning?


Solution

  • Since the column is already part of the df, you can just use

    .filter(col("id").is_in(col("col_2").implode(), false))
    

    If you wanted to do do this with an arbitrary series, you could use

    .filter(col("id").is_in(lit(s.clone()).implode(), false))
    

    What implode does is remove the ambiguity that comes from asking if, say, [1,2,3] is in [1,3,5] — is this matching row-wise, asking if 1 in [1], 2 in [3], and 3 in [5] (true, false, false), or column-wise, asking if 1 in [1,3,5], 2 in [1,3,5], and 3 in [1,3,5] (true, false, true). Implode does this by turning a column into a single list, making is_in operate column-wise (which is what you'd expect, but now it's not ambiguous anymore).