I have a function that creates a ChunkArray<ListType>
from two chunk arrays, however I'm having a hard time converting the column function into a "Function Expression".
The goal is something similar to pearson or spearman correlation implementation in Rust Polars where it takes in two parameters instead of self, see this.
The output should be a new column of type List.
Here is my code where I'm trying to create synthetic_data_expr
, however I hit a dead end on evaluating the expressions.
use polars::prelude::*;
use rand::thread_rng;
use rand_distr::{Normal, Distribution};
fn synthetic_data(
mean_series:&ChunkedArray<Float64Type>,
variance_series:&ChunkedArray<Float64Type>,
) -> PolarsResult<ChunkedArray<ListType>> {
let mut rng = thread_rng();
let random_values: Vec<Vec<f64>> = mean_series.iter()
.zip(variance_series.iter())
.map(|(mean, variance)| {
let std_dev = variance.unwrap().sqrt();
let normal_dist = Normal::new(mean.unwrap(), std_dev).unwrap();
(0..39).map(|_| normal_dist.sample(&mut rng)).collect()
})
.collect();
let mut list_chunk = ListPrimitiveChunkedBuilder::<Float64Type>::new(
"intraday".into(),
5, //rows of data
39,
DataType::Float64
);
for row in random_values {
list_chunk.append_slice(&row);
}
Ok(list_chunk.finish())
}
fn synthetic_data_column(s:&[Column]) -> PolarsResult<Column> {
let _mean = &s[0];
let _varaince = &s[1];
let calc = synthetic_data(_mean.f64().unwrap(), _varaince.f64().unwrap());
Ok(calc?.into_column())
}
fn synthetic_data_expr(mean_column: Expr, variance_column: Expr) -> Expr {
mean_column.apply_many(
synthetic_intraday_data_column(),
&[variance_column],
GetOutput::same_type(),
)
}
Here is an example that I'm trying to accomplish for synthetic_data_expr
/// Compute the pearson correlation between two columns.
pub fn pearson_corr(a: Expr, b: Expr) -> Expr {
let input = vec![a, b];
let function = FunctionExpr::Correlation {
method: CorrelationMethod::Pearson,
};
Expr::Function {
input,
function,
options: FunctionOptions {
collect_groups: ApplyOptions::GroupWise,
cast_options: Some(CastingRules::cast_to_supertypes()),
flags: FunctionFlags::default() | FunctionFlags::RETURNS_SCALAR,
..Default::default()
},
}
}
You can't mimic the way you see most of the exprs in source to create your own because they all go into enums which are hard coded. Instead you'd use map
as seen here
My rust-analyzer doesn't like your synthetic data function but, notwithstanding, here's how you'd use it
fn use_syn(df: DataFrame) {
let res = df
.clone()
.lazy()
.select([as_struct(vec![col("mean_col"), col("var_col")]).map(
|s| {
let ca = s.struct_().unwrap();
let seriess = ca.fields_as_series();
let mean_series = &seriess[0];
let mean_ca = mean_series.f64().unwrap();
let variance_series = &seriess[1];
let variance_ca = variance_series.f64().unwrap();
let out = synthetic_data(mean_ca, variance_ca).unwrap();
Ok(Some(out.into_series().into()))
},
GetOutput::from_type(DataType::List(Box::new(DataType::Float64))),
)])
.collect()
.unwrap();
}
One thing you can do to make that more convenient is make your own traits and then impl it for Expr. With that, your custom function can be wrapped in an Expr method which you can use the same as the native methods. That'd look like this:
trait CustomExprs {
fn syn_data(self, other: Expr) -> Expr;
}
impl CustomExprs for Expr {
fn syn_data(self, other: Expr) -> Expr {
as_struct(vec![self, other]).map(
|s| {
let ca = s.struct_().unwrap();
let seriess = ca.fields_as_series();
let mean_series = &seriess[0];
let mean_ca = mean_series.f64().unwrap();
let variance_series = &seriess[1];
let variance_ca = variance_series.f64().unwrap();
let out = synthetic_data(mean_ca, variance_ca).unwrap();
Ok(Some(out.into_series().into()))
},
GetOutput::from_type(DataType::List(Box::new(DataType::Float64))),
)
}
}
and then you can do
fn use_syn2(df: DataFrame) {
let res = df
.clone()
.lazy()
.select([col("mean").syn_data(col("var_col")).alias("syn_data")]);
}