algorithmhuffman-code

Why Huffman's coding algorithm takes more bit than the original size?


My given string is "Today_is_Monday". If I apply Huffman's coding algorithm to this string. Without encoding, the total size of the string was (15*8) = 120 bits. After encoding the size is (10*8 + 15 + 49) = 144 bits.

As I know Huffman's algorithm uses to reduce size. But why the encoded size is more than its original?

More details I have done are given below enter image description here

Thank you.


Solution

  • The text is too short and the probability distribution function looks uniform. If the frequencies of occurrence are (more or less) the same, the input string gets very close to random noise. It is impossible to compress random noise in a general way, compression will most likely be longer than the input sequence, because one also need to add some metadata, like an encoding table.

    In contrast, consider encoding a string that is: aaaaaaaaaaaaaaa.

    If one tries to encode a longer general English text, one would notice at some point, the encoded string size will start to get dramatically shorter than the original text. This is because the encoded sequence frequencies will start to make a much higher impact - the most frequent character will be encoded with the shortest possible code and because it is repeated a lot, its shorter size will dominate the size of the original character.