swiftstringalgorithmsubstring

Partition a string into substrings that all have majority characters


I'm trying to solve this problem:

A string has a majority character if the majority of the positions in the string are the same character. For example, ababa has the majority character a (three of five positions), but abab and abcd do not.

Every non-empty string can be partitioned into one or more substrings that have majority characters, because if nothing else, it can be partitioned into single-character substrings.

Given a non-empty string with length ≤ 5000, find the minimum number of substrings that it can be partitioned into such that each of the substrings has a majority character.

For example, if the string is abca then the answer is 4 (a, b, c, a); if the string is ababa then the answer is 1 (ababa); and if the string is aaaabbbcd then the answer is 2 (aaaa, bbbcd).

I've written a solution using dynamic programming, and I believe that the basic idea is correct, but it's too slow: it requires O(length3) time, and the length can be as high as 5000. How can I speed this up? Do I need to fundamentally change my approach?

func minSubstrings(_ s: String) -> Int {
  let chars = Array(s)
  let length = chars.count
  guard length > 0 else { return 0 }

  var dp = Array(repeating: Int.max, count: length + 1)
  dp[0] = 0

  func isValidSubstring(start: Int, end: Int) -> Bool {
    var freq: [Character: Int] = [:]
    for i in start..<end {
      let char = chars[i]
      freq[char, default: 0] += 1
    }
    let substringLength = end - start
    for count in freq.values {
      if count > substringLength / 2 {
        return true
      }
    }
    return false
  }

  for end in 1...length {
    for start in 0..<end {
      if isValidSubstring(start: start, end: end) {
        dp[end] = min(dp[end], dp[start] + 1)
      }
    }
  }

  return dp[length]
}

For anyone who may have a similar problem, I put the modified code below. The complexity is quadratic.

func minSubstrings(_ s: String) -> Int {
  let chars = Array(s)
  let length = chars.count
  guard length > 0 else { return 0 }
  
  var dp = Array(repeating: Int.max, count: length + 1)
  dp[0] = 0
  
  for start in 0..<length {
    var freq: [Character: Int] = [:]
    var maxFreq = 0
    var majorityChar: Character? = nil
    
    for end in start..<length {
      let char = chars[end]
      freq[char, default: 0] += 1
      if freq[char]! > maxFreq {
        maxFreq = freq[char]!
        majorityChar = char
      }
      
      let substringLength = end - start + 1
      
      if maxFreq > substringLength / 2 {
        dp[end + 1] = min(dp[end + 1], dp[start] + 1)
      }
    }
  }
  
  return dp[length]
}

Solution

  • The dynamic-programming part of your solution is fine, but the problem is that the isValidSubstring function is too slow — and there's no way to make it fast in its current form.

    The reason is that by treating isValidSubstring(a, b) and isValidSubstring(a, b + 1) as completely unrelated computations, you force the latter to start over without any of the information you gathered in the former.

    Instead, for each value of start, you can do a single O(length) pass to determine which substrings starting at start are "valid" ones. You can do that by keeping a single freq throughout this pass (so that each new end only has to update the one element), and by using the fact that the only characters that can possibly be a majority character in s[start..<end+1] are s[end] and the majority character in s[start..<end] (so that each new end only needs to check those two elements in freq). That gives you an O(length2) algorithm.