algorithmtheoryhuffman-codegreedyproof-of-correctness

Greedy resolution to constructing Huffman codes: question on greedy-choice property proof


I am reading The Introduction of Algorithm by Thomas and Charles, 3rd Edition.

when coming to the greedy algo section for Huffman codes - Correctness - greedy-choice property - Lemma 16.2:

Let C be an alphabet in which each character c belonging to C has frequency c.freq. Let x and y be two characters in C having the lowest frequencies. Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit.

Then in the following Proof, idea is to replacing two sibling leaves of max depth, with x and y in an optimal tree then in the new tree with x, y sibling leaves of max length, prove its cost(c.freq * c.depth) is not more than the original tree.

However one extra assumption that x, y are in the max depth in the new tree appears in the substituting proving idea, which seems not mentioned and unnecessary in the Lemma 16.2. Does it cause the loss of generality?


Solution

  • Nope, there’s nothing lost here. The lemma just requires that there’s an optimal coding tree where

    1. the codes for x and y have the same length, and
    2. that they differ only in the last bit.

    The proof shows that there’s an optimal coding tree where

    1. the codes for x and y have the same length,
    2. that they differ only in the last bit, and
    3. that they’re as deep in the tree as any other leaves.

    Since the proof shows something stronger than the original lemma asks for, it’s still a valid proof.