arrays string algorithm complexity-theory counting

Algorithm to find the number of specific continuous substrings

I am trying to solve the following algorithmic problem:

Every day, Bob goes to work and does one of 26 possible tasks. The tasks are coded with the letters of the English alphabet from a to z. Sometimes Bob performs tasks assigned by his boss, and sometimes he can choose which of the 26 possible tasks he will do himself. Such days are marked with the symbol !. The boss has defined Bob's work plan for the upcoming N days. Bob really dislikes doing the same task for several days in a row, and to show his boss how bored he is, he decided to count the number of ways to choose a continuous segment of at least two days during which he will perform the same task every day.

That is, Bob considers all possible segments from day L to day R, where L < R, and if he can choose his work such that the tasks on all days in this segment are the same, then he considers this segment boring.

Help Bob determine the number of boring segments in the work plan for the next N days while he is bored.

Input Format

A single line is entered, which is a sequence of characters consisting of uppercase English letters and the symbol ! — the work plan for the next N (1 ≤ N ≤ 1000000) days.

Output Format

Output a single number — the number of boring segments in the plan.

Example 1

Input: a!b!c

Output: 5

Example 2

Input: a!

Output: 1

Notes

All possible segments in the first example are between | and |
|a!|b!c
a|!b|!c
a|!b!|c
a!|b!|c
a!b|!c|

I formalized the task as follows:

Problem

Given a work plan for N days as a string consisting of uppercase English letters ('a' through 'z') and exclamation marks ('!'), count the number of continuous segments of at least two days (i.e., from day L to day R, where 1≤L<R≤N) such that it is possible for Bob to perform the same task on each day within the segment. A segment is considered "boring" if either:

All characters within the segment are the same non-'!' character.
The segment contains at least one exclamation mark ('!').

Input

A single string representing the work plan for N days.

The string will consist only of uppercase English letters ('a' through 'z') and exclamation marks ('!').

The length of the string N will be between 1 and 1,000,000, inclusive (1≤N≤10⁶).

Output

A single integer representing the total number of "boring" segments in the given work plan.

My Code

func solve() {
  guard let plan = readLine() else { return }
  let n = plan.count
  let planArray = Array(plan)
  var boringSegmentCount = 0
  for i in 0..<n {
    var nonWildcards = Set<Character>()
    for j in i..<n {
      if planArray[j] != "!" {
        nonWildcards.insert(planArray[j])
      }
      if j > i && nonWildcards.count <= 1 {
        boringSegmentCount += 1
      }
    }
  }
  print(boringSegmentCount)
  
}

The provided Swift code aims to solve the problem by iterating through all possible continuous segments of the input string plan.

It uses nested loops to iterate through all possible continuous segments of the string:

The outer loop (i) determines the starting index of the segment.
The inner loop (j) determines the ending index of the segment.

For each segment (from index i to j):

It creates a Set called nonWildcards to store unique characters that are not '!'.
It iterates through the characters within the current segment.
If a character is not '!', it adds it to the nonWildcards set.
It checks if the length of the segment is greater than 1 (j > i) and if the number of unique non-wildcard characters in the segment is at most 1 (nonWildcards.count <= 1).
If both conditions are true, it increments the boringSegmentCount.

The main issue with this algorithm is its quadratic time complexity, O(n²) so for large values of n (up to 1,000,000 as stated in the problem), this algorithm can be very slow and lead to a time limit exceeded error.

My question

Is it possible to propose an algorithm with lower asymptotics for this task? I suppose it's possible to solve this problem in linear time, but to my shame, I still can't come up with a suitable logic.

Solution

You can solve this in O(n) with the classic "how many qualifying segments end here" approach:

function solve(plan) {
    let total = 0,
        exclamCount = 0,
        letterCount = 0,
        letter
    for (let i = 0; i < plan.length; i++) {
        if (plan[i] == '!') {
            if (letter)
                total += letterCount++
            else
                total += exclamCount
            exclamCount++
        } else if (plan[i] == letter) {
            exclamCount = 0
            total += letterCount++
        } else {
            letter = plan[i]
            letterCount = exclamCount + 1
            total += exclamCount
            exclamCount = 0
        }
    }
    return total 
}

console.log(solve('a!b!c'))

Explanation: If you are on a ! then the longest sequence ending here is the one only using ! and a single letter (possibly multiple times). If you are on a letter then the longest sequence ending here is the one only using ! and this letter. In both cases the total number of sequences ending here is the longest sequence - 1 (because minimal sequence length is 2). So by keeping track of maximal exclamation and single letter spans we can easily calculate the number of sequences ending at position i going from left to right.