javahuffman-codeburrows-wheeler-transform

Distance Coding (DC) BWT


i am trying to write BWT with Huffman compression program with Java. In BWT i want to implement Distance Coding (DC). I am looking for some examples, but there isn't so much of them.

I found this example:

http://www.cs.ucr.edu/~stelo/cpm/cpm07/move_to_front_gagie.pdf

DC is starting with 29 page. But it really hard to understand because there is no comments.

Maybe someone had implemented DC or know theory how to implement it in real code ? :)

I understood that part that first off all need to write what char was. But then with distance i didn't get it.

I red that for each character , DC finds its next occurrence in the sequence and write it to S and outputs the distance to it. If there is no occurrence then write 0.

Thanks.


Solution

  • I have written an implementation in Java: http://code.google.com/p/kanzi/source/browse/java/src/kanzi/function/DistanceCodec.java

    You can see the explanation of the algorithm at the beginning of the code (complete example). Also, take a look at the Block coder (it includes BWT + MoveToFront + Zero Length Transform + Entropy coding): http://code.google.com/p/kanzi/source/browse/java/src/kanzi/function/BlockCodec.java

    I have tried to use Distance Coding instead of Move To Front. The output is smaller (better compression) with DC compared to MTFT. However, after entropy coding, the result is the opposite: MTFT looks more amenable to entropy compression.