I'm having hard time to choose the format on which my server and my end points will communicate with.
I am considering:
- JSON
YAML Too hard to parse
- CSV
- Google Protobufs
- Binary packing/unpacking (with no use of casting/memset/memcpy to enable portability)
- Some form of DSL
- Any other suggestion you might have
My criterias are ordered from the most important to the least:
- Which is the easiest to parse?
- Which is the fastest to parse?
- Which has the smallest in bytes?
- Which has the potential to have the most readable messages?
- Which has the potential to be encrypted more easily?
- Which has the potential to be compressed more easily?
EDIT to clarify:
- Are the data transfers bi-directional? Yes.
- What is the physical transport? Ethernet.
- Is the data formatted as packets or streams? Both but usually packets.
- How much RAM do the end-points have? The smallest amount possible, depeands on the format I choose.
- How big are your data? As big as it needs to be. I won't receive huge datasets though.
- Does the end-point have an RTOS? No.
Key factors are:
- what capabilities have your clients?
(e.g. Can you pick an XML parser from the shelf - without ruling out most of them because of performance reasons? Can you compress the packets on the fly?)
- What is the complexity of your data ("flat" or deeply structured?)
- Do you need high-frequency updates? Partial updates?
In my experience:
A simple text protocol (which would categorize itself as DSL) with an interface of
string RunCommand(string commandAndParams)
// e.g. RunCommand("version") returns "1.23"
makes many aspects easier: debugging, logging and tracing, extension of protocol, etc. Having a simple terminal / console for the device is invaluable in tracking down problems, running tests etc.
Let's discuss the limitation in detail, as a point of reference for the other formats:
- The client needs to run a micro parser. That's not as complex as it might sound (the core of my "micro parser library" is 10 functions with about 200 lines of code total), but basic string processing should be possible
- A badly written parser is a big attack surface. If the devices are critical/sensitive, or are expected to run in a hostile environment, implementation requires utmost care. (that's true for other protocols, too, but a quickly hacked text parser is easy to get wrong)
- Overhead. Can be limited by a mixed text/binary protocol, or base64 (which has an overhead of 37%).
- Latency. With typical network latency, you will not want many small commands issued, some way of batching requests and their returns helps.
- Encoding. If you have to transfer strings that aren't representable in ASCII, and can't use something like UTF-8 for that on both ends, the advantage of a text-based protocol drops rapidly.
I'd use a binary protocol only if requried by the device, device processing capabilities are insanely low (say, USB controllers with 256 bytes of RAM), or your bandwidth is severely limited. Most of the protocols I've worked with use that, and it's a pain.
Google protBuf is an approach to make a binary protocol somewhat easier. A good choice if you can run the libraries on both ends, and have enough freedom to define the format.
CSV is a way to pack a lot of data into an easily parsed format, so that's an extension of the text format. It's very limited in structure, though. I'd use that only if you know your data fits.
XML/YAML/... I'd use only if processing power isn't an issue, bandwith either isn't an issue or you can compress on the fly, and the data has a very complex structure. JSON seems to be a little lighter on overhead and parser requirements, might be a good compromise.