algorithmunicodecompressiontext-compression

Text Compression Algorithm


I am just wondering if someone could introduce me any algorithm that compresses Unicode text to 10-20 percent of its original size ? actually I've read Lempel-Ziv compression algorithm which reduces size of text to 60% of original size, but I've heard that there are some algorithms with this performance


Solution

  • If You are considering only text compression than the very first algorithm that uses entropy based encryption called Huffman Encoding

    Huffman Coding

    Then there is LZW compression which uses a dictionary encoding to use previously used sequence of letter to assign codes to reduce size of file.

    LZW compression

    I think above two are sufficient for encoding text data efficiently and are easy to implement.

    Note: Donot expect good compression on all files, If data is random with no pattern than no compression algorithm can give you any compression at all. Percentage of compression depends on symbols appearing in the file not only on the algorithm used.