compressionbrotli

Explanation of Brotli's encoder algorithm


The Brotli compression format is excellently documented in RFC 7932. You can just read this RFC top to bottom, and it tells you how the format works.

However, while you could probably implement a decoder (decompressor) based on the RFC alone, the RFC doesn't describe the encoder algorithm that is part of Google's reference C implementation (the brotli command line tool). In other words, it doesn't tell us what strategies the encoder uses at different quality levels to find an efficient compressed representation for a given input stream.

Of course I can always read the encoder source, but I was wondering if there was an accessible high-level description of how the encoder works?


Solution

  • All I am aware of is a very brief description in this article:

    The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes.

    More importantly, from the same article:

    the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.

    Update:

    AardvarkSoup added a much better answer with this comprehensive paper on how Brotli works, by its authors.