I'm reading Algorithms by Jeff Erickson, and I have trouble in solving the exercise in Greedy Algorithms:
(b) Suppose the total length N of the unencoded message is bounded by a polynomial in the alphabet size n. Prove that any Huffman tree for the frequencies f[1...n] has depth O(log n).
I don't know how to prove this, because I think the depth could reach n-1, like this: f[1...n] = {1, 1, 1, 3, 4, 7, 11, 18, 29, ...} (the Lucas number).
Maybe it's because of "N is bounded by a polynomial in the alphabet size n", but what does it mean? (for example, N = k n, and k is a constant?) If it's like this, I would say it's because when I merge two vertices u1, u2 (u1 <= u2) in the Huffman Tree, the new vertex v = u1 + u2, which is at least once bigger than u1, and I know that the root of the tree is n, so I will only merge vertices for no more than O(log n) times, so the depth is about O(log n). But I don't know if I'm right.
Could you please help me? Thanks!
Yes, you are correct that without any constraints the depth could indeed reach n-1 in the extremely unbalanced case where you effectively have a linked list.
However, the added constraint of "... the total length N of the unencoded message is bounded by a polynomial in the alphabet size n." implies that N=O(n^k). Therefore, the average frequency of each character is N/n=O(n^(k−1)). This implies that no character’s frequency is disproportionately large or small compared to the others. The frequencies are distributed more evenly, preventing extreme imbalances during the tree construction.
If we consider a binary tree, the height is logarithmic in the number of leaves if the tree is balanced. For a perfectly balanced tree, the height is O(log n).
While the Huffman tree with this constraint is not always perfectly balanced, the balanced nature of the merges due to the tend of evenly distributed frequencies ensures that the height is close to O(log n).