rustrust-polars

Create Polars DataFrame with Flattened Json File


The problem that I have is trying to read in a flattened json file into a polars dataframe in Rust.

Here is the Json example with a flattened JSON format. How would this structure be read into a DataFrame without labeling each column dtype in a struct?

{
  "data": [
    {
      "requestId": "IBM",
      "date": "2024-03-19",
      "sales": 61860,
      "company": "International Business Machines",
      "price": 193.34,
      "score": 7
    },
    {
      "requestId": "AAPL",
      "date": "2024-03-19",
      "sales": 383285,
      "company": "Apple Inc.",
      "price": 176.08,
      "score": 9
    },
    {
      "requestId": "MSFT",
      "date": "2024-03-19",
      "sales": 211915,
      "company": "Microsoft Corporation",
      "price": 421.41,
      "score": 7
    } 
  ]
}

There are only Integers, Floats, and Strings in the data.

Here is the example struct that I tried creating. If there are 200+ columns that change, would it be best to create a HashMap to store the columns dynamically?

#[derive(Debug, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
struct Row {
    requestId: String,
    date: String,
    #[serde(flatten)]
    company_data: HashMap<String, serde_json::Value>,
}

This is a second half question for the Non-Flattened JSON data: Transform JSON Key into a Polars DataFrame


Solution

  • This format is almost what polars' JsonReader expects; it is only the top-level object that is the problem. However, we can strip it with string manipulation:

    pub fn flattened(json: &str) -> Result<DataFrame, Box<dyn Error>> {
        let json = json.trim();
        let json = json
            .strip_prefix("{")
            .ok_or("invalid JSON")?
            .strip_suffix("}")
            .ok_or("invalid JSON")?;
        let json = json.trim_start();
        let json = json.strip_prefix(r#""data""#).ok_or("invalid JSON")?;
        let json = json.trim_start();
        let json = json.strip_prefix(":").ok_or("invalid JSON")?;
    
        let json_reader = JsonReader::new(std::io::Cursor::new(json));
        let mut df = json_reader.finish()?;
        let date = df.column("date")?.cast(&DataType::Date)?;
        df.replace("date", date)?;
    
        Ok(df)
    }