I have an arbitrary Matrix M which is (N x A). I have a column vector V (N x 1) which has on each row the amount of entries I would like to keep from the original M <= A (starting from the leftmost)
As an example, say I have the following V (for an arbitrary 5xA Matrix):
[[1]
[0]
[4]
[2]
[3]]
i.e. I want to keep the 1st element of the first row, no elements in row 2, 4 from row 3, 2 from row 4, etc. I want this to generate the following mask:
[[1 0 0 0 0]
[0 0 0 0 0]
[1 1 1 1 0]
[1 1 0 0 0]
[1 1 1 0 0]]
I then apply this mask to my matrix to get the result that I want. What is the fastest way to generate this mask?
Naive pythonic approach:
A = 20
n = 5
V = np.floor(np.random.rand(n) * (A+1))
x = [np.concatenate(np.repeat(1, x), np.repeat(0, 20 - x), axis=1) for x in V]
x = np.array(x)
This code is my current working solution but it is way too slow for large n, so I need a vectorized solution.
Using numpy.fromfunction:
n = 5
A = 20
V = np.floor(np.random.rand(n, 1) * (A + 1))
mask = np.fromfunction(lambda i,j: V > j, (n,20), dtype=int)
This solution is considerably faster for large n, but I am not sure if I can do better than this.
Overall:
Any insights on this problem? Not too familiar with the ins and outs of numpy and python so I thought I'd post this here before I tried purusuing any individual solution further. I am also willing to compile to Cython if that would help this at all, though I know absolutely nothing about that language at the moment but I am willing to look into it. Open to pretty much any and all solutions.
Just use broadcasting and numpy.arange
:
mask = V > np.arange(M.shape[1])
Or from A
:
A = 5
mask = V > np.arange(A)
Output:
array([[ True, False, False, False, False],
[False, False, False, False, False],
[ True, True, True, True, False],
[ True, True, False, False, False],
[ True, True, True, False, False]])
And if you need integers:
mask.astype(int)
array([[1, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 1, 0],
[1, 1, 0, 0, 0],
[1, 1, 1, 0, 0]])