pythonpython-3.xbinary-image

Merge "True" chunks in binary array (Binary Closing)


I have a (big) boolean array and I'm looking for a way to fill True where it merges two sequences of True with minimal length.
For example:

a = np.array([True] *3 + [False] + [True] *4 + [False] *2 + [True] *2)
# a == array([ True,  True,  True, False,  True,  True,  True,  True, False, False,  True,  True])

closed_a = close(a, min_merge_size=2)
# closed_a == array([ True,  True,  True, True,  True,  True,  True,  True, False, False,  True,  True])

Here the False value in index [3] is converted to True because on both sides it has a sequence of at least 2 True elements. Conversely, elements [8] and [9] remain False because the don't have such a sequence on both sides.

I tried using scipy.ndimage.binary_closing with structure=[True True False True True] (and with False in the middle) but it doesn't give me what I need.
Any ideas?


Solution

  • This one was tough, but I was able to come up with something using itertools and more_itertools. Similar to what you had, essentially, the idea is to take consecutive windows on the array, and just directly check if that window contains the indicator sequence of n * True, False, n * True.

    from itertools import chain
    from more_itertools import windowed
    
    
    def join_true_runs(seq, min_length_true=2):
        n = min_length_true
        sentinel = tuple(chain([True] * n, [False], [True] * n))
    
        indecies = [
            i + n for i, w in enumerate(windowed(seq, 2 * n + 1)) if w == sentinel
        ]
        seq = seq.copy() #optional
        seq[indecies] = True
        return seq
    

    You should probably write some tests to check for corner cases, though it does seem to work on this test array:

    arr = np.array([True, True, True, False, True, True, True, False, True, False, True, True, False, False, True, True])
    
    # array is unchanged
    assert all(join_true_runs(arr, 4) == arr)
    
    # only position 3 is changed
    list(join_true_runs(arr, 3) == arr)
    
    # returns [True,
    # True,
    # True,
    # False,
    # True,
    # ...
    # ]
    

    Of course if you want to mutate the original array instead of returning a copy you can do that too.