pythonarraysnumpyvectorization

How to populate a 2-d numpy array with values from a third dimension?


New Post: Processing satellite conjunctions with numpy efficiently

Original Post: I have a numpy array of shape n x m x r, where the n axis represents an object, the m axis represents a timestep and the r axis represents a position vector in 3-d space. I have an array containing three (x, y and z position values) of m objects at n points in time. This is the format my data is delivered in (from the python module sgp4, specifically a Satrec_Array if anyone's interested) so I can't move this further up the data processing chain.

I want to be able to represent this as an m x n array of position vectors, so essentially "collapsing" the position axis into a single array, so an array containing m x n elements, each of which is a position vector relating to an object at a time.

I'm struggling to find a way to do this efficiently - I can do it with bog standard python loops, but as I scale up the number of objects and timesteps this becomes massively inefficient.

I'm not very well versed in numpy, but none of the solutions I've searched or attempts at using methods such as [h/v/d]stack etc. have given me the right final shape. I also looked into vectorization but as far as I can see that just implements a loop under the hood?

Example with random numbers and an input array of shape (3,2,3)

In[1]: m = 3
       n = 2
       r = 3

In[2]: a = np.random.random((m,n,r))

In[3]: a
Out[3]: 
array([[[0.8416, 0.3694, 0.5708],
        [0.3779, 0.579 , 0.207 ]],

       [[0.7871, 0.6547, 0.0047],
        [0.1115, 0.1445, 0.6147]],

       [[0.8538, 0.2821, 0.8094],
        [0.6214, 0.0147, 0.5852]]])

In[4]: a.shape
Out[4]: (3, 2, 3)
       
In[4]: new_a = np.empty(shape=(m,n), dtype=object)
       for i in range(m):
           for j in range(n):
            new_a[i,j] = a[i,j,:]

In[5]: new_a
Out[5]: 
array([[array([0.8416, 0.3694, 0.5708]), array([0.3779, 0.579 , 0.207 ])],
       [array([0.7871, 0.6547, 0.0047]), array([0.1115, 0.1445, 0.6147])],
       [array([0.8538, 0.2821, 0.8094]), array([0.6214, 0.0147, 0.5852])]],
      dtype=object)

In[6]: new_a.shape
Out[6]: (3, 2)

Solution

  • "I'm struggling to find a way to do this efficiently - I can do it with bog standard python loops, but as I scale up the number of objects and timesteps this becomes massively inefficient."

    What you want to do makes no sense in numpy. Unless using an object dtype, there is no way to have anything else than numeric data as items. You should keep your nd-array.

    In fact, what you have ((m, n, r) shape) is already a (m, n) array of vectors of dimension r. Think of it this way if you like rather than (m, n, r).

    Numpy operation can very efficiently operate on a subset of the dimensions. For example:

    # add 10/100/1000 to the first dimension
    a + np.array([10, 100, 1000])[:, None, None]
    
    array([[[  10.8416,   10.3694,   10.5708],
            [  10.3779,   10.579 ,   10.207 ]],
    
           [[ 100.7871,  100.6547,  100.0047],
            [ 100.1115,  100.1445,  100.6147]],
    
           [[1000.8538, 1000.2821, 1000.8094],
            [1000.6214, 1000.0147, 1000.5852]]])
    a + np.array([10, 100])[:, None]
    
    # add 10/100 to the second dimension
    array([[[ 10.8416,  10.3694,  10.5708],
            [100.3779, 100.579 , 100.207 ]],
    
           [[ 10.7871,  10.6547,  10.0047],
            [100.1115, 100.1445, 100.6147]],
    
           [[ 10.8538,  10.2821,  10.8094],
            [100.6214, 100.0147, 100.5852]]])
    

    Your issue is most likely a XY problem. You should keep the (m, n, r) shape and try to solve your ultimate goal with a nd-array.