python algorithm data-structures sliding-window

Edge case of binary subarray with sum

Problem
Given a binary array nums and an integer goal, return the number of non-empty subarrays with a sum goal.

Code

class Solution:
    def numSubarraysWithSum(self, nums, goal):
        count,i,j,sumSoFar=0,0,0,0
        N=len(nums)
        while j < N:
            if sumSoFar==goal:
                print('my solution is [' , i , j , ']')
                count+=1
                sumSoFar+=nums[j]
                j+=1
            elif sumSoFar > goal:
                sumSoFar-=nums[i]
                i+=1
            else:
                sumSoFar+=nums[j]
                j+=1
        while i < N:
            if sumSoFar == goal:
                count+=1
            sumSoFar-=nums[i]
            i+=1
        return count

solution = Solution()
array = [1,0,1,0,1]
goal = 2
a = solution.numSubarraysWithSum(array,goal) #expected 4
print(a) # Output: 4

array = [0,0,0,0,0]
goal = 0
a = solution.numSubarraysWithSum(array,goal) #expected 15
print(a) # Output: 10

Issue
when j == N it stops but it has other subarrays to consider like if you consider nums=[0,0,0,0] and goal=0 you see that it doesnt consider cases from index 2-3 or 1-3 inclusive. When i have a case of [0,0,0,0,0] and i figure out my sumSoFar == 0 at i=0 j=1 ([0,0] as subarray) i dont know how to move forward because if i move forward with i, its a possible subarray and if i move forward with j its a possible subarray. I have to pick one and thus i am missing out on other possible sub arrays and my count is lower than it should be.

Solution

You have correctly identified the problem: you miss out on possibilities by increasing one of the two indices and never going back with the same index.

Analysis

It may help to look at an example and then spot a useful logic. Let's say the input is [0,0,0,0,1,0,1,0,0,1,0], and the goal is 2. Then we will have sublists that either have the first two 1s in it, or the second two 1s. For each of those we can choose to include some zeroes at their left and/or right. We can picture that as follows:

[0,0,0,0,1,0,1,0,0,1,0]
 <<<<<<<<----->>>>
           <<------->>

The dashes in the middle line represent what will be part of all the lists that have the first two 1s, and the "arrows" indicate where we have flexibility to "grow" that list to the left or to the right (grabbing zeroes inside the list).

Then the question is: how many of those lists exist?

If we look at the first set of lists (that have the first two 1s), we have 5 possible starting points for such a list and 3 possible ending points. And so we can conclude there are 5*3=15 such lists. If we apply the same principle to lists that have the last two 1s, then that product is 2*2=4.

It will therefor be useful to get the distances or "gaps" around the 1s. For the example input, we can derive that there are 5 ways to start a list before the first 1, which is the number of zeroes there plus 1. Then we have 2 ways to start a list before the second 1, ...etc. This gives us a "gap" list of [5, 2, 3, 2] for this particular input. With this, we have all information about the input, and can only base the rest of the algorithm on that. Remember that we wanted to multiply 5*3 and 2*2: these entries are 2 slots apart in this "gap" list, in other words, they are goal slots apart!

With that information we can slide a window of width goal over this derived list and get all the products we need.

There is an edge case when goal is zero. In that case we don't need the product of two different "gaps", but calculate how many lists can be made out of a single stretch of zeroes. That really is a triangular number, and so we can use the formula 𝑛(𝑛−1)/2.

Implementation

With the above insights we can implement a solution.

I provide here a spoiler solution:

def numSubarraysWithSum(self, nums, goal): # Create a gap-list, i.e. a list of distances between two 1s gaps = [] j = 0 for i, value in enumerate(nums): if value: gaps.append(i - j + 1) j = i + 1 gaps.append(len(nums) - j + 1) if goal == 0: # Edge case: sum up triangular numbers return sum(gap * (gap - 1) // 2 for gap in gaps) else: return sum(before * after for before, after in zip(gaps, gaps[goal:]))