rubysubstringruby-hash

Finding Longest Substring No Duplicates - Help Optimizing Code [Ruby]


So I've been trying to solve a Leetcode Question, "Given a string, find the length of the longest substring without repeating characters."

For example

Input: "abcabcbb"
Output: 3 
Explanation: The answer is "abc", with the length of 3. 

Currently I optimized my algorithm when it comes to figuring out if the substring is unique by using a hash table. However my code still runs in O(n^2) runtime, and as a result exceeds the time limit during submissions.

What i try to do is to essentially go through every single possible substring and check if it has any duplicate values. Am I as efficient as it gets when it comes to the brute force method here? I know there's other methods such as a sliding window method but I'm trying to get the brute force method down first.

# @param {String} s
# @return {Integer}
def length_of_longest_substring(s)
    max_length = 0
    max_string = ""
    n = s.length
    for i in (0..n-1)
        for j in (i..n-1)
            substring = s[i..j]
            #puts substring
            if unique(substring)
                if substring.length > max_length
                    max_length = substring.length
                    max_string = substring
                end
            end
        end
    end
    return max_length
end

def unique(string)
    hash = Hash.new(false)
    array = string.split('')
    array.each do |char|
        if hash[char] == true
            return false
        else
            hash[char] = true
        end
    end
    return true
end

Solution

  • Approach

    Here is a way of doing that with a hash that maps characters to indices. For a string s, suppose the characters in the substring s[j..j+n-1] are unique, and therefore the substring is a candidate for the longest unique substring. The next element is therefore e = s[j+n] We wish to determine if s[j..j+n-1] includes e. If it does not we can append e to the substring, keeping it unique.

    If s[j..j+n-1] includes e, we determine if n (the size of the substring) is greater than the length of the previously-known substring, and update our records if it is. To determine if s[j..j+n-1] includes e, we could perform a linear search of the substring, but it is faster to maintain a hash c_to_i whose key-value pairs are s[i]=>i, i = j..j_n-1. That is, c_to_i maps the characters in the substring to their indices in full string s. That way we can merely evaluate c_to_i.key?(e) to see if the substring contains e. If the substring includes e, we use c_to_i to determine its index in s and add one: j = c_to_i[e] + 1. The new substring is therefore s[j..j+n-1] with the new value of j. Note that several characters of s may be skipped in this step.

    Regardless of whether the substring contained e, we must now append e to the (possibly-updated) substring, so that it becomes s[j..j+n].

    Code

    def longest_no_repeats(str)
      c_to_i = {}
      longest = { length: 0, end: nil }
      str.each_char.with_index do |c,i|
        j = c_to_i[c]
        if j
          longest = { length: c_to_i.size, end: i-1 } if
            c_to_i.size > longest[:length]
          c_to_i.reject! { |_,k| k <= j }
        end
        c_to_i[c] = i
      end
      c_to_i.size > longest[:length] ? { length: c_to_i.size, end: str.size-1 } :
        longest
    end
    

    Example

    a = ('a'..'z').to_a
      #=> ["a", "b",..., "z"]
    
    str = 60.times.map { a.sample }.join
      #=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexssbuuawxmhprkfms"
    
    longest = longest_no_repeats(str)
      #=> {:length=>14, :end=>44} 
    str[0..longest[:end]]
      #=> "ekgdaxxzlwbxixhlfbpziswcoelplhobivoygmupdaexs" 
    str[longest[:end]-longest[:length]+1,longest[:length]]
      #=>                                "bivoygmupdaexs" 
    

    Efficiency

    Here is a benchmark comparison to @mechnicov's code:

    require 'benchmark/ips'
    
    a = ('a'..'z').to_a
    arr = 50.times.map { 1000.times.map { a.sample }.join }
    
    Benchmark.ips do |x|
      x.report("mechnicov") { arr.sum { |s| max_non_repeated(s)[:length]   } }
      x.report("cary")      { arr.sum { |s| longest_no_repeats(s)[:length] } }
      x.compare!
    end
    

    displays:

    Comparison:
                cary:       35.8 i/s
           mechnicov:        0.0 i/s - 1198.21x  slower