encoding

How to encode data to remove any 0x00 bytes


I am streaming some data of fixed length. The data consists of some 64-bit ints as well as some 32-bit floats. Because the format of the data is fixed and known, I am just sending an array of bytes with a known endian-ness. The data can then be easily reconstructed at the other end.

However, my transport protocol will not allow any 0x00 bytes. Is there a way I can encode my data differently to avoid this? Losing some range in the data is fine (e.g. ints having a maximum of 2^60 is totally fine). Incresing the full size of the message is totally fine too, as long as the full length of data is fixed no matter what the values of the ints and floats are (e.g. if ints now take 9 bytes to store).

I don't know much about encoding formats, but I learned about CRCs a long time ago and I'm wondering if there's something like that, which will add some fixed length block to the end of the bytestream, but which will prevent the bytestream from containing any 0x00 bytes?


Solution

  • Let's take the case of 64-bit numbers:

    1. Reduce your value range to 256 and use the last (or first) 7 bytes to encode that value.
    2. For the 7 value bytes, replace all the 0x00 bytes with 0xff bytes. Record the positions of the bytes that have been flipped.
    3. Use the remaining byte as a bit mask to encode the positions of the bytes that have been flipped. This will take up 7 bits of that remaining byte. The first (or last) bit of that byte needs to be always set to 1 to prevent the encoding byte to become 0x00 itself.

    For example:

    1. Take the 7 byte value b2 00 c3 d4 e5 ff 00.
    2. Flip the 0x00 bytes to get b2 ff c3 d4 e5 ff ff. Bytes 2 and 7 have been flipped.
    3. Create the bit mask 0100001 and prefix with a 1 bit to get a binary value of 10100001, or a hex value of 0xa1.
      Your encoded 64-bit value will then be a1 b2 ff c3 d4 e5 ff ff.

    The approach for 32-bit numbers is the same. Use 28 bits for the value, 3 bits to encode which bytes have been flipped, and the leftover bit always set to 1.