rustapache-arrow-datafusion

Creating Datafusion's Dataframe from Vec<Struct> in Rust?


I am trying to do something similar to this question here but instead of using the polars library, I will like to use the Datafusion library

The idea is to go from a vec of struct like this:

#[derive(Serialize)]
struct Test {
    id:u32,
    amount:u32
}

and save to Parquet files, just like in the question I referenced.

While it was possible using polars, as seen in the accepted answer to achieve this by going from the Struct, serialise to JSON and then build the Dataframe from that, I could not find similar approach using Datafusion.

Any suggestions will be appreciated.


Solution

  • I think the parquet_derive is designed exactly for the usecase of writing Rust structs to/from Parquet files. DataFusion would be useful if you wanted to process the resulting data, for example filtering or aggregating it with SQL

    Here is an example in the docs: https://docs.rs/parquet_derive/30.0.1/parquet_derive/derive.ParquetRecordWriter.html