dynamic-programminghuffman-codeprefix-tree

Dynamic Programming for prefix-free coding


Is there a way of computing the prefix-free coding of a given dictionary of letters and their frequencies. Similar to Huffman-Coding but dynamically computed - how does the optimization function look like?

The problem with building the tree just to position i of the dictionary is, that the lowest frequent letters could change and so the whole tree's structure would.


Solution

  • Yes, there are several ways to generate prefix-free codes dynamically.

    As you suggested, it would be conceptually simple to start with some default frequency, track the frequencies of the letters used so far, and for every letter decoded, increment that letter's count and then re-build a Huffman tree from all the counts. (potentially completely changing the tree after each letter). That would require a lot of work for each letter and be very slow -- and yet there are a couple of adaptive Huffman coding algorithms that effectively do the same thing -- using clever algorithms that do much less work, and so are faster.

    Many other data compression algorithms also generate prefix-free codes dynamically much faster than any adaptive Huffman algorithm, at a small sacrifice of compression -- such as Polar codes or Engel coding or universal codes such as Elias delta coding.

    The arithmetic coding data compression algorithm is technically not a prefix-free code, but typically gives slightly better compression (but runs slower) than either static Huffman coding or adaptive Huffman coding. Arithmetic coding is generally implemented adaptively, tracking the frequencies of all the letters used so far. (Many arithmetic coding implementations track even more context -- if the previous letter was a "t", it remembers that the most-frequent letter in this context is "h" and exactly how frequent it was, etc., giving even better compression).