I have a 2-dimensional ndarray (Array2<f64>
) of dimension N x K
in rust.
I would like to convert this to a single polars series of dtype List[f64]
i.e. every row will be a list of dimension K
.
How can one accomplish this efficiently in polars-rust? I could convert the ndarray into e.g. Vec<Vec<f64>>
by iterating across rows but am unsure how to proceed from there.
This is some sample code, with what I have tried so far, in case helpful:
use ndarray::prelude::*;
use polars::prelude::*;
let array: Array2<f64> = array![[1., 2.], [3., 4.], [5., 6.]];
let vec_of_vecs: Vec<Vec<f64>> = array
.axis_iter(ndarray::Axis(0))
.map(|row| row.to_vec())
.collect();
// what to do now?
// *edit* this works, and does not require converting to vec of vecs first, but it is relatively slow
let chunked_list: ListChunked = array
.axis_iter(ndarray::Axis(0))
.map(|row| {
Series::from_vec("", row.to_vec())
}
)
.collect();
let series: Series = chunked_list.into_series();
Apart from going to polars dataframe then using polars expressions to create lists from the columns what is the most efficient path forward?
Thanks,
let mut chunked_builder = ListPrimitiveChunkedBuilder::<Float64Type>::new(
"",
array.len_of(Axis(0)),
array.len_of(Axis(1)),
DataType::Float64,
);
for row in array.axis_iter(Axis(0)) {
match row.as_slice() {
Some(row) => chunked_builder.append_slice(row),
None => chunked_builder.append_slice(&row.to_vec()),
}
}
let series = chunked_builder.finish().into_series();
In the best case (the data is continuously available) this allocates and copies the data only once. In the worst case, it does so twice (like your code), but it is always faster than your code (I benchmarked). The fast path will always be used if the array is an owned array (Array2
) allocated using the default row-major layout.