pythonpython-3.xnumpynumpy-slicingstructured-array

How to get field of nested numpy structured array (advanced indexing)


I have a complex nested structured array (often used as a recarray). Its simplified for this example, but in the real case there are multiple levels.

c = [('x','f8'),('y','f8')]
A = [('data_string','|S20'),('data_val', c, 2)]
zeros = np.zeros(1, dtype=A)
print(zeros["data_val"]["x"])

I am trying to index the "x" datatype of the nested arrays datatype without defining the preceding named fields. I was hoping something like print(zeros[:,"x"]) would let me slice all of the top level data, but it doesn't work.

Are there ways to do fancy indexing with nested structured arrays with accessing their field names?


Solution

  • I don't know if displaying the resulting array helps you visualize the nesting or not.

    In [279]: c = [('x','f8'),('y','f8')]
         ...: A = [('data_string','|S20'),('data_val', c, 2)]
         ...: arr = np.zeros(2, dtype=A)
    In [280]: arr
    Out[280]: 
    array([(b'', [(0., 0.), (0., 0.)]), (b'', [(0., 0.), (0., 0.)])],
          dtype=[('data_string', 'S20'), ('data_val', [('x', '<f8'), ('y', '<f8')], (2,))])
    

    Note how the nesting of () and [] reflects the nesting of the fields.

    arr.dtype only has direct access to the top level field names:

    In [281]: arr.dtype.names
    Out[281]: ('data_string', 'data_val')
    In [282]: arr['data_val']
    Out[282]: 
    array([[(0., 0.), (0., 0.)],
           [(0., 0.), (0., 0.)]], dtype=[('x', '<f8'), ('y', '<f8')])
    

    But having accessed one field, we can then look at its fields:

    In [283]: arr['data_val'].dtype.names
    Out[283]: ('x', 'y')
    In [284]: arr['data_val']['x']
    Out[284]: 
    array([[0., 0.],
           [0., 0.]])
    

    Record number indexing is separate, and can be multidimensional in the usual sense:

    In [285]: arr[1]['data_val']['x'] = [1,2]
    In [286]: arr[0]['data_val']['y'] = [3,4]
    In [287]: arr
    Out[287]: 
    array([(b'', [(0., 3.), (0., 4.)]), (b'', [(1., 0.), (2., 0.)])],
          dtype=[('data_string', 'S20'), ('data_val', [('x', '<f8'), ('y', '<f8')], (2,))])
    

    Since the data_val field has a (2,) shape, we can mix/match that index with the (2,) shape of arr:

    In [289]: arr['data_val']['x']
    Out[289]: 
    array([[0., 0.],
           [1., 2.]])
    In [290]: arr['data_val']['x'][[0,1],[0,1]]
    Out[290]: array([0., 2.])
    In [291]: arr['data_val'][[0,1],[0,1]]
    Out[291]: array([(0., 3.), (2., 0.)], dtype=[('x', '<f8'), ('y', '<f8')])
    

    I mentioned that fields indexing is like dict indexing. Note this display of the fields:

    In [294]: arr.dtype.fields
    Out[294]: 
    mappingproxy({'data_string': (dtype('S20'), 0),
                  'data_val': (dtype(([('x', '<f8'), ('y', '<f8')], (2,))), 20)})
    

    Each record is stored as a block of 52 bytes:

    In [299]: arr.itemsize
    Out[299]: 52
    In [300]: arr.dtype.str
    Out[300]: '|V52'
    

    20 of those are data_string, and 32 are the 2 c fields

    In [303]: arr['data_val'].dtype.str
    Out[303]: '|V16'
    

    You can ask for a list of fields, and get a special kind of view. Its dtype display is a little different

    In [306]: arr[['data_val']]
    Out[306]: 
    array([([(0., 3.), (0., 4.)],), ([(1., 0.), (2., 0.)],)],
          dtype={'names': ['data_val'], 'formats': [([('x', '<f8'), ('y', '<f8')], (2,))], 'offsets': [20], 'itemsize': 52})
    
    In [311]: arr['data_val'][['y']]
    Out[311]: 
    array([[(3.,), (4.,)],
           [(0.,), (0.,)]],
          dtype={'names': ['y'], 'formats': ['<f8'], 'offsets': [8], 'itemsize': 16})
    

    Each 'data_val' starts 20 bytes into the 52 byte record. And each 'y' starts 8 bytes into its 16 byte record.