csvrustrust-polarspolars

Polars Rust API creating a dataframe from a string variable / reading csv with options from a string


Using the Polars Rust API, is it possible to create a DataFrame directly from a CSV string / reader while specifying options such as the separator?

Currently, I'm working around this by saving the string to a temporary path and reading it using LazyCsvReader::new("tmp_path.csv").

In the eventual use case, (potentially large) CSV data is received via e.g. a request.

use anyhow::Result;
use polars::prelude::*;

fn main() -> Result<()> {
    let csv_str = "name|age|city
Alice|30|New York
Bob|25|London";

    // Writing it to a file, but I'd prefer to read the CSV data directly.
    std::fs::write("tmp.csv", csv_str)?;
    let df = LazyCsvReader::new("tmp.csv").with_separator(b'|').finish()?.collect()?;

    // Also tried `CsvReader`, though I couldn't figure out how to make it work with a custom delimiter.
    /* let cursor = std::io::Cursor::new(csv_str);
    let df = CsvReader::new(cursor).finish()?; */

    println!("{df}");

    Ok(())
}

Solution

  • You can use the into_reader_with_file_handle method of the CsvReadOptions struct to creates a CSV reader using a file handle. Use the map_parse_options to set the custom separator.

    use polars::prelude::*;
    
    fn main() {
        let csv_str = "name|age|city
    Alice|30|New York
    Bob|25|London";
        let handle = std::io::Cursor::new(csv_str);
    
        let df = CsvReadOptions::default()
            .with_has_header(true)
            .map_parse_options(|parse_options| parse_options.with_separator(b'|'))
            .into_reader_with_file_handle(handle)
            .finish()
            .unwrap();
        println!("{:?}", df);
    }