pythonnumpyperformanceoptimizationvectorization

Python Vectorized Mask Generation (Numpy)


I have an arbitrary Matrix M which is (N x A). I have a column vector V (N x 1) which has on each row the amount of entries I would like to keep from the original M <= A (starting from the leftmost)

As an example, say I have the following V (for an arbitrary 5xA Matrix):

[[1]
[0]
[4]
[2]
[3]]

i.e. I want to keep the 1st element of the first row, no elements in row 2, 4 from row 3, 2 from row 4, etc. I want this to generate the following mask:

[[1 0 0 0 0] 
[0 0 0 0 0]
[1 1 1 1 0] 
[1 1 0 0 0] 
[1 1 1 0 0]]

I then apply this mask to my matrix to get the result that I want. What is the fastest way to generate this mask?

Naive pythonic approach:

A = 20
n = 5
V = np.floor(np.random.rand(n) * (A+1)) 
x = [np.concatenate(np.repeat(1, x), np.repeat(0, 20 - x), axis=1) for x in V]
x = np.array(x)

This code is my current working solution but it is way too slow for large n, so I need a vectorized solution.

Using numpy.fromfunction:

n = 5
A = 20
V = np.floor(np.random.rand(n, 1) * (A + 1)) 

mask = np.fromfunction(lambda i,j: V > j, (n,20), dtype=int)

This solution is considerably faster for large n, but I am not sure if I can do better than this.

Overall:

Any insights on this problem? Not too familiar with the ins and outs of numpy and python so I thought I'd post this here before I tried purusuing any individual solution further. I am also willing to compile to Cython if that would help this at all, though I know absolutely nothing about that language at the moment but I am willing to look into it. Open to pretty much any and all solutions.


Solution

  • Just use broadcasting and numpy.arange:

    mask = V > np.arange(M.shape[1])
    

    Or from A:

    A = 5
    mask = V > np.arange(A)
    

    Output:

    array([[ True, False, False, False, False],
           [False, False, False, False, False],
           [ True,  True,  True,  True, False],
           [ True,  True, False, False, False],
           [ True,  True,  True, False, False]])
    

    And if you need integers:

    mask.astype(int)
    
    array([[1, 0, 0, 0, 0],
           [0, 0, 0, 0, 0],
           [1, 1, 1, 1, 0],
           [1, 1, 0, 0, 0],
           [1, 1, 1, 0, 0]])