I'm trying to solve this problem:
A string has a majority character if the majority of the positions in the string are the same character. For example,
ababa
has the majority charactera
(three of five positions), butabab
andabcd
do not.Every non-empty string can be partitioned into one or more substrings that have majority characters, because if nothing else, it can be partitioned into single-character substrings.
Given a non-empty string with length ≤ 5000, find the minimum number of substrings that it can be partitioned into such that each of the substrings has a majority character.
For example, if the string is
abca
then the answer is 4 (a
,b
,c
,a
); if the string isababa
then the answer is 1 (ababa
); and if the string isaaaabbbcd
then the answer is 2 (aaaa
,bbbcd
).
I've written a solution using dynamic programming, and I believe that the basic idea is correct, but it's too slow: it requires O(length3) time, and the length can be as high as 5000. How can I speed this up? Do I need to fundamentally change my approach?
func minSubstrings(_ s: String) -> Int {
let chars = Array(s)
let length = chars.count
guard length > 0 else { return 0 }
var dp = Array(repeating: Int.max, count: length + 1)
dp[0] = 0
func isValidSubstring(start: Int, end: Int) -> Bool {
var freq: [Character: Int] = [:]
for i in start..<end {
let char = chars[i]
freq[char, default: 0] += 1
}
let substringLength = end - start
for count in freq.values {
if count > substringLength / 2 {
return true
}
}
return false
}
for end in 1...length {
for start in 0..<end {
if isValidSubstring(start: start, end: end) {
dp[end] = min(dp[end], dp[start] + 1)
}
}
}
return dp[length]
}
For anyone who may have a similar problem, I put the modified code below. The complexity is quadratic.
func minSubstrings(_ s: String) -> Int {
let chars = Array(s)
let length = chars.count
guard length > 0 else { return 0 }
var dp = Array(repeating: Int.max, count: length + 1)
dp[0] = 0
for start in 0..<length {
var freq: [Character: Int] = [:]
var maxFreq = 0
var majorityChar: Character? = nil
for end in start..<length {
let char = chars[end]
freq[char, default: 0] += 1
if freq[char]! > maxFreq {
maxFreq = freq[char]!
majorityChar = char
}
let substringLength = end - start + 1
if maxFreq > substringLength / 2 {
dp[end + 1] = min(dp[end + 1], dp[start] + 1)
}
}
}
return dp[length]
}
The dynamic-programming part of your solution is fine, but the problem is that the isValidSubstring
function is too slow — and there's no way to make it fast in its current form.
The reason is that by treating isValidSubstring(a, b)
and isValidSubstring(a, b + 1)
as completely unrelated computations, you force the latter to start over without any of the information you gathered in the former.
Instead, for each value of start
, you can do a single O(length) pass to determine which substrings starting at start
are "valid" ones. You can do that by keeping a single freq
throughout this pass (so that each new end
only has to update the one element), and by using the fact that the only characters that can possibly be a majority character in s[start..<end+1]
are s[end]
and the majority character in s[start..<end]
(so that each new end
only needs to check those two elements in freq
). That gives you an O(length2) algorithm.