I want to deserialize json values in parallel using rayon
. A valid json from the serde-json
example fails when trying to deserialize inside par_iter
, despite being parsed correctly without parallelization. This is the code:
use rayon::prelude::*; // 1.7.0
use serde_json::{Result, Value};
fn main() -> Result<()> {
let data = r#"
{
"name": "John Doe",
"age": 43,
"phones": [
"+44 1234567",
"+44 2345678"
]
}"#;
let v: Value = serde_json::from_str(data)?;
println!("Please call {} at the number {}", v["name"], v["phones"][0]);
let mut batch = Vec::<String>::new();
batch.push(data.to_string());
batch.push(data.to_string());
let _values = batch.par_iter()
.for_each(|json: &String| {
serde_json::from_str(json.as_str()).unwrap()
});
Ok(())
}
and this is the error
thread 'thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: map, expected unit", line: 2, column: 8)', src/main.rs:23:49
IIRC, I've seen other par_iter
examples that use unwrap
inside. Is this not recommended? In my case, I want to do it because I need the program to panic if an invalid input comes in.
serde_json::from_str
determines its output type automatically from the type of variable it gets written into. In your case, however, for_each
doesn't expect a return value, so from_str
attempt to deserialize it into a ()
.
Use map().collect()
together with a : Vec<Value>
annotation to make this work:
use rayon::prelude::*; // 1.7.0
use serde_json::{Result, Value};
fn main() -> Result<()> {
let data = r#"
{
"name": "John Doe",
"age": 43,
"phones": [
"+44 1234567",
"+44 2345678"
]
}"#;
let v: Value = serde_json::from_str(data)?;
println!("Please call {} at the number {}", v["name"], v["phones"][0]);
let mut batch = Vec::<String>::new();
batch.push(data.to_string());
batch.push(data.to_string());
let values: Vec<Value> = batch
.par_iter()
.map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
.collect();
println!("Values:\n{:#?}", values);
Ok(())
}
Please call "John Doe" at the number "+44 1234567"
Values:
[
Object {
"age": Number(43),
"name": String("John Doe"),
"phones": Array [
String("+44 1234567"),
String("+44 2345678"),
],
},
Object {
"age": Number(43),
"name": String("John Doe"),
"phones": Array [
String("+44 1234567"),
String("+44 2345678"),
],
},
]
Although honestly, it's a little weird to use serde::Value
; usually people deserialize directly into a struct:
use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use serde_json::Result;
#[derive(Debug, Serialize, Deserialize)]
struct Entry {
name: String,
age: u32,
phones: Vec<String>,
}
fn main() -> Result<()> {
let data = r#"
{
"name": "John Doe",
"age": 43,
"phones": [
"+44 1234567",
"+44 2345678"
]
}"#;
let v: Entry = serde_json::from_str(data)?;
println!("Please call {} at the number {}", v.name, v.phones[0]);
let mut batch = Vec::<String>::new();
batch.push(data.to_string());
batch.push(data.to_string());
let values: Vec<Entry> = batch
.par_iter()
.map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
.collect();
println!("Values:\n{:#?}", values);
Ok(())
}
Please call John Doe at the number +44 1234567
Values:
[
Entry {
name: "John Doe",
age: 43,
phones: [
"+44 1234567",
"+44 2345678",
],
},
Entry {
name: "John Doe",
age: 43,
phones: [
"+44 1234567",
"+44 2345678",
],
},
]