I'm writing Burrows-Wheeler Transform and its reverse in Python. It works fine for small strings, but it fell apart when I tested a bigger string. At certain points, the string seems to loop over. I'm sure it must have to do with the final loop of the decoder, but I'm following the steps found on more than one website. My implementation is as follows:
class BurrowsWheelerTransform:
def __init__(self, data):
self.data = data
def transform(self):
# get data size
size = len(self.data)
# get order (by index) of rotations
order = sorted(range(size), key=lambda i: self.data[i:])
# get index of original rotation
index = order.index(0)
# return size appended with last column of (imaginary) rotation table
return chr(255) * (index // 255) + chr(index % 255) + ''.join(self.data[(i - 1 + size) % size] for i in order)
def restore(self):
# get index of end of index
eoi = next(i for i in range(len(self.data)) if ord(self.data[i]) < 255)
# get index
index = 255 * eoi + ord(self.data[eoi])
# get tranformed content
content = self.data[eoi + 1:]
# get lshift array
lshift = [i - 1 for symbol in sorted(set(content)) for i, x in enumerate(self.data) if x == symbol]
# restore
restored = ''
for i in range(len(content)):
index = lshift[index]
restored += content[index]
# return restored
return restored
Original string:
Unwilling to obtrude himself on the princess, Rostv did not go back to the house but remained in the village awaiting her departure. When her carriage drove out of the house, he mounted and accompanied her eight miles from Boguchrovo to where the road was occupied by our troops. At the inn at Yankvo he respectfully took leave of her, for the first time permitting himself to kiss her hand.
How can you speak so! he blushingly replied to Princess Marys expressions of gratitude for her deliverance, as she termed what had occurred. Any police officer would have done as much! If we had had only peasants to fight, we should not have let the enemy come so far, said he with a sense of shame and wishing to change the subject. I am only happy to have had the opportunity of making your acquaintance. Good-by, Princess. I wish you happiness and consolation and hope to meet you again in happier circumstances. If you dont want to make me blush, please dont thank me!
Decoded string:
Unwilling to obtrude himself on the princess, Rostv did not go back to the house but remained in the village awaiting her departure. When her carriage drove out of the house, he mounted and accompanied her eight miles from Boguchrovo to where the road was occupied by our troops. At the inn at Yankvo he respectfully took leave of her, for the first time permitting himself to kiss her hand.
How can you speak so!Unwilling to obtrude himself on the princess, Rostv did not go back to the house but remained in the village awaiting her departure. When her carriage drove out of the house, he mounted and accompanied her eight miles from Boguchrovo to where the road was occupied by our troops. At the inn at Yankvo he respectfully took leave of her, for the first time permitting himself to kiss her hand.
How can you speak so!Unwilling to obtrude himself on the princess, Rostv did not go back to the house but remained in the village awaiting her departure. When
Bizarrely enough, it seems that this happens for other implementations I found on the web and tested, like this one and this one. What is going on? Have I misunderstood how the transform works? Or is this implementation incorrect?
Found the answer! This algorithm is based on the assumption that the string is concatenated to one final character, which is unique and lexicographically smaller than any other character in the string. However, as I intend to utilize any value in the 0-255 range for any given character, no extra symbols are at my disposal. Luckily, thanks to John Kurlak and some extra bug fixing, I've been able to slightly modify my initial implementation to account for this fact. Here it is:
class BurrowsWheelerTransform:
def __init__(self, data):
self.data = data
def transform(self):
# get data size
size = len(self.data)
# get doubled string
self.data *= 2
# get order (by index) of rotations
order = sorted(range(size), key=lambda i: self.data[i:])
# get index of original rotation
index = order.index(0)
# return index appended with last column of (imaginary) rotation table
return chr(255) * (index // 255) + chr(index % 255) + ''.join(self.data[(i - 1 + size) % size] for i in order)
def restore(self):
# get index of end of index
eoi = next(i for i in range(len(self.data)) if ord(self.data[i]) < 255)
# get index
index = 255 * eoi + ord(self.data[eoi])
# get tranformed content
content = self.data[eoi + 1:]
size = len(content)
# get lshift array
lshift = [i for symbol in sorted(set(content)) for i, x in enumerate(content) if x == symbol]
# restore
restored = ''
for i in range(size):
index = lshift[index]
if index >= size: break
restored += content[index]
# return restored
return restored
Thanks, John!