numpyarray-broadcasting

Fast orthogonal projection of vectors onto other vectors numpy


I have two large equal size 2D numpy arrays of cartesian vectors:

A = [[ax1, ay1, az1], [a2], [a3], ...] where a1 = [ax1, ay1, az1]
B = [[bx1, by1, bz1], [b2], [b3], ...] where b1 = [bx1, by1, bz1]

I need to orthogonally project each vector in A onto its respective vector in B and output it to an array of vectors C. The vectors in B are unique. If I were to use a for-loop I could use:

c1 = (np.dot(a1, b1) / np.dot(b1, b1)) * b1

And store c1, c2, c3, etc. in a 2D output array C.

However in reality A and B each contain about 50 million vectors so using Python for-loops is painfully slow.

I tried simply replacing the individual vectors with the 2D arrays in the equation above hoping that numpy would automatically broadcast it. It didn't unfortunately. I tried using the modifier [:,None] on array B hoping that that would help numpy understand I want it to work along the vertical axis of my arrays and orthogonally project the individual vectors in A onto those in B but no luck. I've tried googling for an anwser to this but so far no luck. Anyone any ideas how to do this more quickly than a for-loop, preferrably entirely in numpy?


Solution

  • The pseudocode looks clear enough to me, so I'll go ahead and answer with an example in which we have 50 (instead of 50 million) pairs of vectors to project in a 3-dimensional (corresponding with the x, y, and z in the pseudocode) space:

    import numpy as np
    rng = np.random.default_rng(842945689356)
    A, B = rng.random((2, 50, 3))
    
    # Method 1
    dot_ai_bi = (A * B).sum(axis=-1, keepdims=True)
    dot_bi_bi = (B * B).sum(axis=-1, keepdims=True)  # or square `norm`
    C = dot_ai_bi / dot_bi_bi * B
    
    # Method 2 (faster in my tests)
    # inspired by an non-accepted answer of
    # https://stackoverflow.com/questions/62500584/vector-dot-product-of-corresponding-rows-using-numpy
    # Forgive me for answering if this question would be considered a duplicate,
    # but I think it's a slight extension
    dot_ai_bi2 = np.einsum('ij,ij->i', A, B)
    dot_bi_bi2 = np.einsum('ij,ij->i', B, B)
    C2 = (dot_ai_bi2 / dot_bi_bi2)[:, np.newaxis] * B
    
    # Check results against looping over rows
    for ai, bi, ci, ci2 in zip(A, B, C, C2):
        ref = (ai @ bi) / (bi @ bi) * bi
        np.testing.assert_allclose(ci, ref)
        np.testing.assert_allclose(ci2, ref)
    

    In short, dot and matmul want to do regular matrix multiplicatation, whereas you want a reducing operation like dot with an axis argument. You can get that behavior by implementing the inner product yourself (since sum has an axis argument), or you can use einsum. (Perhaps there is some clever way of doing what you want with dot or matmul by taking advantage of their N-D array rules, but I suspect either of the solutions above would be just as performant and more natural.)