pythonalgorithmoff-by-one

Avoiding Python Off-by-One Error in RLE Algorithm


EDIT: there's more wrong with this than just an off-by-one error, it seems.

I've got an off-by-one error in the following simple algorithm which is supposed to display the count of letters in a string, along the lines of run-length encoding.

I can see why the last character is not added to the result string, but if I increase the range of i I get index out of range for obvious reasons.

I want to know what the conceptual issue is here from an algorithm design perspective, as well as just getting my code to work.

Do I need some special case code to handle the last item in the original string? Or maybe it makes more sense to be comparing the current character with the previous character, although that poses a problem at the beginning of the algorithm?

Is there a general approach to this kind of algorithm, where current elements are compared to previous/next elements, which avoids index out of range issues?

def encode(text):
    # stores output string
    encoding = ""
    i = 0

    while i < len(text) - 1:
        # count occurrences of character at index i
        count = 1
        while text[i] == text[i + 1]:
            count += 1
            i += 1

        # append current character and its count to the result
        encoding += text[i] + str(count) 
        i += 1

    return encoding

text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1

Solution

  • You're right, you should have while i < len(text) for the external loop to process the last character if it is different for the previous one (d in your case).

    Your algorithm is then globally fine, but it will crash when looking for occurrences of the last character. At this point, text[i+1] becomes illegal.

    To solve this, just add a safety check in the internal loop: while i+1 < len(text)

    def encode(text):
        # stores output string
        encoding = ""
        i = 0
    
        while i < len(text):
            # count occurrences of character at index i
            count = 1
            # FIX: check that we did not reach the end of the string 
            # while looking for occurences
            while i+1 < len(text) and text[i] == text[i + 1]:
                count += 1
                i += 1
    
            # append current character and its count to the result
            encoding += text[i] + str(count) 
            i += 1
    
        return encoding
    
    text = "Hello World"
    print(encode(text))
    # Gives H1e1l2o1 1W1o1r1l1d1