jsonrustserderobustness

Safely and efficiently processing a json web service response in rust


I have already searched a lot but have so far not found the solution I was hoping for:

Q: What is a proper and efficient way in rust to process a json response from a web service which can by definition not be trusted or relied on 100%.

  1. The json fields/structure is known.
  2. Values can be empty (or of unexpected content)
  3. The processing should be graceful and robust, i.e. forgiving an empty/unexpected value and continue.

The elegant way with serde of parsing a string directly into an object like let p: Person = serde_json::from_str(data)?; seems not to be an option as it ungracefully fails if one value is not according to expectations. Or is there an option to make it more robust/graceful?

Extracting and validating every single value with let v: Value = serde_json::from_str(data)?; and x = v["text"].to_string() and capturing every single error individually on the other hand is not a very efficient solution.

Is there a better way to do this efficiently and in a graceful, robust manner that would at least continue when values are missing and could handle empty values or other minor issues gracefully?

PS: Examples from https://docs.rs/serde_json/latest/serde_json/


Solution

  • You can't have deserialization fail and also not fail. You need to pick one or the other. If the API sometimes returns "unexpected content" then that kind of content is part of the effective schema and you could use e.g. an untagged enum to represent it.

    For example, what if a field is usually a number, but is sometimes a non-numeric string, or is omitted entirely?

    #[derive(Deserialize)]
    #[serde(untagged)]
    enum NumberOrString {
        Number(f64),
        String(String),
    }
    
    #[derive(Deserialize)]
    struct ApiResponse {
        pub some_value: Option<NumberOrString>,
    }
    

    Now you can handle any of the three cases:

    match response.some_value {
        None => { /* Value was null or missing */ }
        Some(NumberOrString::Number(n)) => { /* Value was numeric */ }
        Some(NumberOrString::String(s)) => { /* Value was a string */ }
    }
    

    (Playground)

    The point is that if the API sometimes responds with a structure that is in contradiction with the schema the API documentation describes, then the API documentation is wrong. You should instead model your schema against what the API actually returns. You may not know all of the possibilities ahead of time, and so your program will sometimes fail to deserialize the response, and that's okay! When you learn of a new idiosyncrasy, encode it into the schema your program accepts. The strongly-typed nature of Rust, including its enforcement that match arms be exhaustive, means that when you adjust the schema, the compiler will point out all of the places in your program that you need to alter to account for the new possibility.