javascriptcserializationdata-formats

JavaScript-friendly binary-safe data format design (not JSON or XML)


First and foremost: JSON and XML are not an option in this specific case, please don't suggest them. If this makes it easier to accept that fact, imagine that I intend to reinvent the wheel for self-education.

Back to the point:

I need to design a binary-safe data format to encode some datagrams I send to a particular dumb server that I write (in C if that matters).

To simplify the question, let's say that I'm sending only numbers, strings and arrays.

Important fact: Server does not (and should not) know anything about Unicode and stuff. It treats all strings as binary blobs (and never looks inside them).

The format that I originally devised is as follows:

Example:

[ 1, "foo", [] ]

Serializes as follows:

1   ; number of items in datagram
A   ; -- array --
3   ; number of items in array
N   ; -- number --
1   ; number value
S   ; -- string --
3   ; string size in bytes
foo ; string bytes
A   ; -- array --
0   ; number of items in array

The problem is that I can not reliably get a string size in bytes in JavaScript.

So, the question is: how to change the format, so a string can be both saved in JS and loaded in C neatly.

I do not want to add Unicode support to the server.

And I do not quite want to decode strings on server (say, from base64 or simply to unescape \xNN sequences) — this would require work with dynamic string buffers, which, given how dumb the server is, is not so desirable...

Any clues?


Solution

  • It seems that reading UTF-8 in plain C is not that scary after all. So I'm extending the protocol to handle UTF-8 strings natively. (But will appreciate an answer to this question as it stands.)