pythonlistalgorithm

Optimizing brute force approach to Longest Substring Without Repeating Characters


I'm trying to improve my Python by solving some problems on LeetCode.

I'm currently working on the Longest Substring Without Repeating Characters problem:

Given a string, find the length of the longest substring without repeating characters.

Example 1:

Input: "abcabcbb"
Output: 3 
Explanation: The answer is "abc", with the length of 3. 

Example 2:

Input: "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3. 
             Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

I am trying to do brute forcing, by finding all substrings and then finding the longest one:

def lengthOfLongestSubstring(self, s: str) -> int:
        list_of_values = [0]
        if (not s):
            return 0

        new_list = list(s[0])
        i = 1
        size_s = len(s)
        while i < size_s:
            if s[i] not in new_list:
                new_list.append(s[i])
                i += 1
            else:
                list_of_values.append(len(new_list))
                new_list = []
                s = s[1:]
                i = 0
                size_s = len(s)
                if size_s < max(list_of_values):
                    break
        return max(max(list_of_values), len(new_list))

The solution works, but it times out for the last test case on LeetCode (a very long string). Does anyone have suggestions on how to make this faster?


Solution

  • These are just some hints for the algorithm.

    You can set two "pointers" to the first letter,

    abcabcbb
    ^start
    ^end
    

    then you advance the end pointer and put the letters in a hashmap or something, so you can efficiently check for repeated letters when you advance. As soon as you get a repetition,

    abcabcbb
    ^start
       ^end
    

    you save the two positions and the length in list or tuple, and then you advance start until there are no more repeated letters

    abcabcbb
     ^start
       ^end
    

    Then you start advancing end again, repeating the process, by overriding the two saved positions and the length only if the lenght increases.

    You should end up with the start and end positions of the last longest substring you met, in this case bcb

    abcabcbb
        ^start
          ^end
    

    Note that it is the last, and other valid candidates are abc, bca, and so on.