c++cassemblyssevarint

Optimizing variable-length encoding


I've got a case where I need to compress a lot of often small values. Thus I compress them with a variable-length byte encoding (ULEB128, to be specific):

size_t
compress_unsigned_int(unsigned int n, char* data)
{
  size_t size = 0;
  while (n  > 127)
  {
    ++size;
    *data++ = (n & 127)|128;
    n >>= 7;
  }
  *data++ = n;
  return ++size;
}

Is there a more efficient way to do this (maybe using SSE)?

Edit: After this compression, the result is stored into data, taking size bytes. Then, the compression function is called on the next unsigned int.


Solution

  • If your unsigned int values are limited to a specific range - say, 32 bits - you can unroll the loop:

    size_t
    compress_unsigned_int(unsigned int n, char* data)
    {
      size_t size;
    
      if (n < 0x00000080U) {
        size = 1;
        goto b1;
      }
      if (n < 0x00004000U) {
        size = 2;
        goto b2;
      }
      if (n < 0x00200000U) {
        size = 3;
        goto b3;
      }
      if (n < 0x10000000U) {
        size = 4;
        goto b4;
      }
      size = 5;
    
      *data++ = (n & 0x7f) | 0x80;
      n >>= 7;
    b4:
      *data++ = (n & 0x7f) | 0x80;
      n >>= 7;
    b3:
      *data++ = (n & 0x7f) | 0x80;
      n >>= 7;
    b2:
      *data++ = (n & 0x7f) | 0x80;
      n >>= 7;
    b1:
      *data = n;
      return size;
    }