I need to filter a polars dataframe in rust by searching a list of years in a datetime column in the dataframe. I tried this but it didn't work.
fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
let years_vec = years.to_vec();
let mut df = df.lazy()
.filter(
col("time").dt().year()
.cast(DataType::Int64)
.is_in(years_vec)
)
.collect()?;
Ok(df)
}
Getting this compile error
error[E0599]: no method named `is_in` found for enum `Expr` in the current scope
--> src/main.rs:13:18
|
11 | / col("time").dt().year()
12 | | .cast(DataType::Int64)
13 | | .is_in(years_vec)
| |_________________-^^^^^
Claude gave the following which seems to work but it is too complicated.
fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
let years_vec = years.to_vec();
let mut df = df.lazy()
.filter(
col("time").dt().year()
.cast(DataType::Int64)
.apply(move |s| {
let s = s.i64()?;
Ok(Some(s.into_iter().map(|opt_v| {
opt_v.map(|v| years_vec.contains(&(v as i32)))
}).collect()))
}, GetOutput::from_type(DataType::Boolean))
)
.collect()?;
Ok(df)
}
What's wrong with is_in?
I am new to rust and I can't figure out what's the issue.
There are actually two unrelated issues with your current code.
is_in
is not found is most likely, that you forgot to activate the is_in
feature flag as mentioned in the documentation for the is_in
function. Your Cargo.toml
should thus include something like the following: polars = { version = "0.44.0", features = ["is_in", "lazy"] }
is_in
function requires an expression as the function argument, not a Vec
. There are multiple ways to fix this. I decided to convert the slice to a Series
and then use the lit
function in order to create a fitting expression.Find an adjusted version below:
fn reduce_df(df: DataFrame, threshold: f64, years: &[i32]) -> Result<DataFrame, PolarsError> {
let years_series = Series::new("years".into(), years);
let mut df = df
.lazy()
.filter(
col("time")
.dt()
.year()
.cast(DataType::Int64)
.is_in(lit(years_series)),
)
.collect()?;
Ok(df)
}