rustbase64serdereqwest

Is base64 encoding a good way to safe space when using serde_json for a Vec<u8>


I got a Vec<u8> in code which I want to send over with json using reqwest. This line of code gives me around 400 as output where content is a Vec<u8>: content.len(). I suspect the same amount of bytes to be used when using serde_json, however the amount is tripled when I use this code:

serde_json::to_value(content).to_string().to_bytes().len()

This gives around 1200. I also tried switching to serde_bytes but that gives the same result. When I encode it as base64, the output is 500, so a small overhead but ok. Decoding a Value back to a Vec<u8> is a pain, it looks like the overhead is coming from the fact that serde_json is encoding everything as u64.

Context: I call an endpoint which has a strict limit of bytes which I can send to, that's why I want to do a calculation beforehand to omit errors calling the endpoint by counting the bytes that the endpoint will receive. If the way of counting bytes isn't correct, I would like to know that as well.


Solution

  • JSON is required to be Unicode text, and so it's not possible in general to send arbitrary byte strings without encoding. By default, serde_json encodes this data as an array of bytes, which, as you've noted, is not very efficient.

    You can certainly use Base64, in which case the number of bytes N of input will take (floor((N + 2) / 3) * 4) bytes. If your data is likely to be mostly ASCII, such as a Unix filename, you could use something like percent-encoding (that is, % becomes %25 and control characters and characters larger than 127 become %NN, while ASCII non-controls stay the same). How much that expands the text depends on the byte sequences you're using.

    If you control the destination endpoint, you may consider using CBOR instead. It's much like JSON in many ways, but it's binary and it can encode both byte and text strings efficiently, as well as all the other data types JSON supports (and more). It will definitely expand your data substantially less, since byte strings are serialized as-is with only a small prefix.