arraysrustrust-polarsrust-ndarray

How to convert dataframe with strings into ndarray on Rust


I'm facing with problem using Rust for converting Polars DataFrame with string values into ndarray without One Hot Encoding.

The example of code I used is the following:

println!("{:?}", _df.to_ndarray::<Float64Type>(Default::default()).unwrap());

Is there any solution for that?


Solution

  • I think you can use the apply method and iterate over each column in the DataFrame and convert it to a numeric representation.so the resulting DataFrame, df_numeric, will have numeric values instead of strings and finally use the to_ndarray method to convert the DataFrame to an ndarray, and the resulting ndarray, ndarray, will have Option type to handle missing values.

    use polars::prelude::*;
    use ndarray::prelude::*;
    
    fn main() {
        //make a Polars DataFrame with string values
        let df = DataFrame::new(vec![
            Series::new("col1", &["a", "b", "c"]),
            Series::new("col2", &["x", "y", "z"]),
        ])
        .unwrap();
    
        //converting string columns to numeric representation
        let df_numeric = df.apply(|s: &Series| s.utf8().unwrap().as_ref().map(|v| v.get(0) as u32));
    
        //converting the DataFrame to an ndarray
        let ndarray: Array2<Option<u32>> = df_numeric
            .to_ndarray::<UInt32Type>(Default::default())
            .unwrap();
    
        println!("{:?}", ndarray);
    }