stringalgorithmstring-algorithm

How to find number of 010 in a certain range of a binary string


Given a binary string. How to find occurances of "010" within a certain range of the string.
For example, I have the string "0100110" . If the given range is 3 7 ( 1 based indexing ) then the output will be 4. I could not find any faster way to solve it.

While trying this I can solve it in O(N) complexity. The approach is - firstly I point out the position of all '1' within the certain range and using those position I will figure out the number of '0' in both back and forth. And then multiply number of '0' found in back for a single '1' with number of '0' in found in forth. Then sum up the multiplied result for each of the '1' within the certain range.

For the given example the position of '1' within the range is {5, 6}. Now for index 5 I have number of '0' in both back and forth is 2 and 1 respectively. So we can make subsequence "010" is 2. Similarly for index 6 we also get the answer is 2. In total we can make the subsequence "010" is 4 times in total.

But when we have a number of Q queries of certain ranges for a given string then my approach easily reaches into the time complexity O(N2). I tried a lot but failed to find a way to optimize it. Can anybody help me with an approach that is less than O(N2) complexity? Just to mention the time limit should be 1 Second. It will be a plus if you provide a pseudo code.

~Thanks in Advance.


Solution

  • Pretreatment: make auxiliary array containing cumulative number of zeros upto given position (with aux[0]=0)

      0 1 0 0 1 1 0  //string
    0 1 1 2 3 3 3 4  //aux array A[]
    

    For given L..R range scan for ones, for every k index of 1 get number of zeros in range - O(1) operation

    P[k] = (A[k] - A[L-1]) * (A[R] - A[k])
    S = Sum(P[k], k=L..R)
    

    So we have O(R-L) time per query and the worst case O(Q*N) for Q queries

    But look at formula thoroughly:

    P[k] = (A[k] - A[L-1]) * (A[R] - A[k]) = 
           A[k] * (A[R] + A[L-1]) - A[k]^2 - A[R] * A[L-1] = 
           A[k] * LRSum - A[k]^2 - LRProd
    S = Sum(A[k] for ones) * LRSum - Sum(A[k]^2) - LRProd * NumOfOnes 
    

    Note that LRSum and LRProd are constants for given query, and we have to calculate sums of A[k] for positions of ones and sum of squares for the same positions. Seems we can use the same idea of cumulative array and get result in O(1) per query.

    Quick check gives (3+3)*5 - (9+9) - 4*2 = 30-18-8 = 4 for your example.

    Using cumulative arrays:

      0 1 0 0 1  1  0  //string
    0 1 1 2 3 3  3  4  //aux array A[]
    0 0 1 1 1 4  7  7  //aux array B[]
    0 0 1 1 1 10 19 19  //aux array C[]
    
    Result = (B[R] - B[L-1]) * (A[R] + A[L-1]) - (C[R] - C[L-1]) - 
                     A[R] * A[L-1] * (R - L - 1 - (A[R] - A[L-1])) = 
             (7-1) * (4 + 1) - (19 - 1) - 4 * 1 * (7 - 2  - 4 + 1) = 4