Hey i have a question about optimization palindromes count algorithm
Task: Find count of palindromes in string.
in my func i use "in the forehead" method its like O(n^2) can you guys help make it in O(n) or O(nlogn)
func isPalindrome(string: String) -> Bool {
let str = (string.lowercased())
let strWithoutSpace = str.components(separatedBy: .whitespaces).joined(separator: "")
let strArray = Array(strWithoutSpace.characters)
var i = 0
var j = strArray.count-1
while i <= j {
if strArray[i] != strArray[j] { return false }
i+=1
j-=1
}
return true
}
func palindromsInString(string: String) -> Int {
var arrayOfChars = Array(string.characters)
var count = 0
for i in 0..<arrayOfChars.count-1 {
for x in i+1..<arrayOfChars.count {
if isPalindrome(string: String(arrayOfChars[i...x])) {
count+=1
}
}
}
return count
}
and yes in my instance one letter can't be a palindrome
I'm not familiar with Manacher's algorithm, but I've always enjoyed figuring out efficient algorithms, so I thought I'd have a stab at this.
Your algorithm for determining if a string is a palindrome looks like the kinds I've come up with, so I decided to just use your isPalindrome
function, though I changed it to be a function in an extension of String
instead, and I removed the whitespace removal logic, as I felt that needed to be in the invoking call rather than in the palindrome determining function itself.
extension String {
func isPalindrome() -> Bool {
if length < 2 { return false }
let str = lowercased()
let strArray = Array(str.characters)
var i = 0
var j = strArray.count-1
while i <= j {
if strArray[i] != strArray[j] { return false }
i+=1
j-=1
}
return true
}
}
After looking at your palindromsInString
solution, it looks like a standard brute force, but simple and readable solution.
My first thought for a different algorithm was also pretty brute force, but was a totally different approach, so I'm calling it the Naive solution.
The idea of the Naive solution is to create arrays of substrings of the original string and check if each substring is a palindrome or not. The way I determine the substrings is to start with the biggest substring possible (the original string) and then get the 2 substrings of length string.length-1
, and then string.length-2
and so on until lastly I get all the substrings of length 2 (I'm ignoring substrings of length 1 because you said a string of length 1 can't be a palindrome).
ie: substrings of "test" greater than length 1 would be:
["test"] ["tes", "est"] ["te", "es", "st"]
So you just loop through each of those arrays and check if any are palindromes, and increasing the count if they are:
extension String {
var length: Int { return characters.count }
func substringsOfLength(length: Int) -> [String] {
if self.length == 0 || length > self.length { return [] }
let differenceInLengths = self.length - length
var firstIndex = startIndex
var lastIndex = index(startIndex, offsetBy: length)
var substrings: [String] = []
for _ in 0..<differenceInLengths {
substrings.append(substring(with: Range(uncheckedBounds: (firstIndex, lastIndex))))
firstIndex = index(after: firstIndex)
lastIndex = index(after: lastIndex)
}
substrings.append(substring(with: Range(uncheckedBounds: (firstIndex, lastIndex))))
return substrings
}
}
extension String {
func containsAPalindromeNaive(ignoringWhitespace: Bool = true) -> Int {
let numChars = length
if numChars < 2 { return 0 }
let stringToCheck = (ignoringWhitespace ? self.components(separatedBy: .whitespaces).joined(separator: "") : self).lowercased()
var outerLoop = numChars
var count: Int = 0
while outerLoop > 0 {
let substrings = stringToCheck.substringsOfLength(length: outerLoop)
for substring in substrings {
if substring.isPalindrome() {
count += 1
}
}
outerLoop -= 1
}
return count
}
}
I knew full well that this algorithm would be slow, but I wanted to implement it as a second baseline for my real solution.
I call this solution the Smart solution. It is a multipass solution that takes advantage of the number of, and positions of, the characters within a string.
In the first pass, I generate what I call a character mapping. The character mapping is a dictionary that maps a Character
to an array of indices. So you go over the string and add each character's index to an array stored under it's character value as the key.
The idea is that palindromes can only exist within a string that is bookended by the same letter. Therefore, you only need to check the substrings within a string at the indices of a particular letter. In the word "tattoo", you have 3 distinct letters: "t", "a", "o". The character mappings would look like this:
t: [0,2,3]
a: [1]
o: [4,5]
I now know that palindromes can only exist in this word between (0,2), (2,3), and (4,5). So I only need to check 3 substrings(0,2), (0,3), (2,3), and (4,5). So I only need to check 4 substrings. That's the idea. And once you've checked out all possible substrings bookended by a particular letter, you can ignore any other substrings you come across that start with that letter, because you've already checked them.
In the second pass, I go through each character in the string and if I haven't already checked that letter, I go through the pairs of permutation indices generated by generateOrderedPairIndexPermutations
for the indices in the character mapping and check the substrings to see if they are a palindrome. I then do 2 optimizations here. First, if the distance between the starting character index and the ending character index is less 3, it must be a palindrome (a distance of 1 means they are sequential, and a distance of 2 means there's a single letter in between them, thus also guaranteed to be a palindrome). Second, because I already know that the first and last characters are the same, I don't need to check the whole substring, just from the second letter up until the second to last letter. So if the substring would be "test", and I'm always guaranteeing myself that the substring is bookended by the same letter, I don't actually need to check "test", and I can instead just check "es". It's a smaller optimization, but a nice one nonetheless.
extension Collection {
func generateOrderedPairIndexPermutations() -> [(Index,Index)] {
if count < 2 {
return []
}
var perms: [(Index,Index)] = []
var firstIndex = startIndex
while firstIndex != endIndex {
var secondIndex = index(firstIndex, offsetBy: 1)
while secondIndex != endIndex {
perms.append((firstIndex,secondIndex))
secondIndex = index(secondIndex, offsetBy: 1)
}
firstIndex = index(firstIndex, offsetBy: 1)
}
return perms
}
}
extension String {
func generateCharacterMapping() -> [Character : [Int]] {
var characterMapping: [Character : [Int]] = [:]
for (index, char) in characters.enumerated() {
if let indicesOfChar = characterMapping[char] {
characterMapping[char] = indicesOfChar + [index]
} else {
characterMapping[char] = [index]
}
}
return characterMapping
}
func containsAPalindromeSmart(ignoringWhitespace: Bool = true) -> Int {
let numChars = length
if numChars < 2 { return 0 }
let stringToCheck = (ignoringWhitespace ? self.components(separatedBy: .whitespaces).joined(separator: "") : self).lowercased()
let characterMapping = stringToCheck.generateCharacterMapping()
var count: Int = 0
var checkedChars: Set<Character> = Set()
for char in stringToCheck.characters {
if checkedChars.contains(char) == false {
if let characterIndices = characterMapping[char], characterIndices.count > 1 {
let perms = characterIndices.generateOrderedPairIndexPermutations()
for (i,j) in perms {
let startCharIndex = characterIndices[i]
let endCharIndex = characterIndices[j]
if endCharIndex - startCharIndex < 3 {
count += 1
} else {
let substring = stringToCheck.substring(with: Range(uncheckedBounds: (stringToCheck.index(stringToCheck.startIndex, offsetBy: startCharIndex+1), stringToCheck.index(stringToCheck.startIndex, offsetBy: endCharIndex))))
if substring.isPalindrome() {
count += 1
}
}
}
checkedChars.insert(char)
}
}
}
return count
}
}
I felt pretty good about this solution. But I had no clue just how fast it really was.It was really fast
Using XCTest to measure performance, I ran each algorithm through some performance tests. Using this string as the base source: "There are multiple palindromes in here""Was it a car or a cat I saw", updated based on suggestions to use a more rigorous input string, which is 3319 characters long when whitespace is removed and it is lowercased ("therearemultiplepalindromesinhere""wasitacaroracatisaw"), I also created copies that were this string times 2 ("therearemultiplepalindromesinheretherearemultiplepalindromesinhere"wasitacaroracatisawwasitacaroracatisaw), times 4, times 8, and times 10. Since we're trying to determine the O() of the algorithm, scaling the number of letters up and measuring the scale factor is the way to go.
In order to get more accurate data, I ran each test through 10 iterations (I would've liked more iterations, but the original solution and my Naive solution both don't finish in any timely manner on the tests above times 4). Here's the timings data I collected (screenshot of the spreadsheet was easier than entering it again here):
UPDATED
UPDATED Green is Author; Red is Naive Solution; Orange is Smart Solution
As you can see, both your original solution and my Naive Solution operate in quadratic time (your solution has a quadratic correlation coefficient of r=0.9997, and my Naive Solution has a coefficient of r=0.9999; so, very clearly quadratic!). My Naive solution seems to be under your solution, but they are both increasing quadratically, and are therefore O(n^2), as we already knew.
The interesting part about my Smart Solution is it seems linear! My small, 5-point data set through a regression calculator, and it has a correlation coefficient of r=0.9917! So if it isn't linear, it's so close that I wouldn't care.
All of the solutions operate in quadratic time now. Le sigh. But at least the bug was discovered, addressed, and science prevailed this day. Bummer that my "Smart Solution" didn't actually end up linear. However, I will note that if the input string isn't already a giant palindrome (like the one I ended up changing it to be), then the "Smart Solution"'s optimizations make it perform much faster, albeit still in quadratic time.
I don't know if my algorithm is any easier to understand than the Manacher's algorithm, but I hope I explained it well. And the results are pretty promising, so I hope you find a good use out of it. This is actually still true. I think it's a promising algorithm. Perhaps my code for generateOrderedPairIndexPermutations
isn't the best.