rustrust-polars

How to filter a datetime column in rust polars dataframe by a list of years?


I need to filter a polars dataframe in rust by searching a list of years in a datetime column in the dataframe. I tried this but it didn't work.

fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
    let years_vec = years.to_vec();

    let mut df = df.lazy()
        .filter(
            col("time").dt().year()
                .cast(DataType::Int64)
                .is_in(years_vec)
        )
        .collect()?;

    Ok(df)
}

Getting this compile error

error[E0599]: no method named `is_in` found for enum `Expr` in the current scope
   --> src/main.rs:13:18
    |
11  | /             col("time").dt().year()
12  | |                 .cast(DataType::Int64)
13  | |                 .is_in(years_vec)
    | |_________________-^^^^^

Claude gave the following which seems to work but it is too complicated.

fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
    let years_vec = years.to_vec();

    let mut df = df.lazy()
        .filter(
            col("time").dt().year()
                .cast(DataType::Int64)
                .apply(move |s| {
                    let s = s.i64()?;
                    Ok(Some(s.into_iter().map(|opt_v| {
                        opt_v.map(|v| years_vec.contains(&(v as i32)))
                    }).collect()))
                }, GetOutput::from_type(DataType::Boolean))
        )
        .collect()?;

    Ok(df)
}

What's wrong with is_in?

I am new to rust and I can't figure out what's the issue.


Solution

  • There are actually two unrelated issues with your current code.

    1. The reason that is_in is not found is most likely, that you forgot to activate the is_in feature flag as mentioned in the documentation for the is_in function. Your Cargo.toml should thus include something like the following: polars = { version = "0.44.0", features = ["is_in", "lazy"] }
    2. The is_in function requires an expression as the function argument, not a Vec. There are multiple ways to fix this. I decided to convert the slice to a Series and then use the lit function in order to create a fitting expression.

    Find an adjusted version below:

    fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
        let years_series = Series::new("years".into(), years);
    
        let mut df = df
            .lazy()
            .filter(
                col("time")
                    .dt()
                    .year()
                    .cast(DataType::Int64)
                    .is_in(lit(years_series)),
            )
            .collect()?;
    
        Ok(df)
    }