algorithmradix-sortsuffix-arrayburrows-wheeler-transform

Is radix sort used for suffix sorting?


I'm trying to implement block sorting. This is from the Burrows Wheeler paper.

(Before this step, you create a V suffix array of S)

Q4. [radix sort]
Sort the elements of V , using the first two characters of each suffix as the sort key. This can be done efficiently using radix sort.

So I understand you are sorting the suffixes with radix sort.
How is this supposed to update the array V? Only after the radix sort finished I can know the sorted position of a suffix. Suppose that the 4th suffix end up being the first after sorted. So V[0] = i. In this case, we know (because I told you) that i = 4. But how does the algorithm know that since we are not keeping track of their position. Should I make a Class that contains both the suffix and its suffix number?


Solution

  • After a quick read; I think Burrows-Wheeler have an error and meant to say to sort the elements of W using the array V to track and map the final locations of the elements of W. ie. Such that W is unchanged and V contains a sorted list of indices.

    The paper appears to treat V as an array of pointers to elements in W from that point forward.

    Check out http://michael.dipperstein.com/bwt/ There is a great description as well as source code for the algorithm at the bottom of the page.