All sources i've read quote the below process to get Huffman codes:
Let's try this for the following Frequency Map:
A -> 1 B -> 2 C -> 3 D -> 4
Making the Tree
(ABCD, 10)
/ \
(ABC, 6) D:4
/ \
(AB, 3) C:3
/ \
A:1 B:2
Following this tree to get the codes:
A -> 000 B -> 001 C -> 01 D -> 1
Obviously, the Max depth and the max code length does not follow the logical rule of being log2(nElement), which in this case should be two.
Optimally the codes would be:
A -> 00 B -> 01 C -> 10 D -> 11
So what's goin on here ? what am i doing wrong ?
Imagine encoding String "ABBCCCDDDD" - sample string where each symbol occurs with the given frequency.
Huffman encoding you came up with encodes it as 0000010010101011111
- 19 characters.
Your encoding where you optimized for max code length gives 00010110101011111111
- 20 characters.
Your mistake is thinking that max depth is what's being optimized (or what needs to be optimized) - but Huffman encoding optimizes for encoded text length.
And as a side note, as I mentioned in a comment, Huffman produces "an" optimal solution, not "the" optimal solution, it's possible I think for a lower-depth solution to be just as good, but it won't be better.