pythonnumpystructured-array

Finding matching subset of "row" in a numpy structured array


I have data stored in a NumPy structured array where part of the information identifies various cases. I would like to find the row that matches a given case. E.g., let's say I'm storing the name of a building, room number, and the number of chairs and tables in the room in a (2,) array. This would then look something like this:

import numpy as np

my_dtype = [('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))]
room_info = np.array([('BLDG0', 12, [24, 6]),
                      ('BLDG1', 34, [32, 10]),
                      ('BLDG0', 14, [10, 20])],
                      dtype=my_dtype)

Now say that I want to find the row for building 'BLDG0', room 14. Based on the answer to Finding a matching row in a numpy matrix, I tried

sub_fields = ['building', 'room']
matching_index, = np.where(room_info[sub_fields] == ('BLDG0', 14))

which would ideally result in [2]. However, this results in the following warning:

FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future.

and returns an empty array. Is there a way to find the matching sub-row for a large set of data other than comparing each column separately and then finding the matching indices?

I am using NumPy version 1.18.5 through miniconda and it doesn't look like I can safely update to a newer version within this environment. (Though I'm not sure if newer versions support this type of comparison)


Solution

  • In [243]: my_dtype = [('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (
         ...: 2,))]
         ...: room_info = np.array([('BLDG0', 12, [24, 6]),
         ...:                       ('BLDG1', 34, [32, 10]),
         ...:                       ('BLDG0', 14, [10, 20])],
         ...:                       dtype=my_dtype)
    In [244]: room_info
    Out[244]: 
    array([('BLDG0', 12, [24,  6]), ('BLDG1', 34, [32, 10]),
           ('BLDG0', 14, [10, 20])],
          dtype=[('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))])
    
    In [246]: room_info['building']
    Out[246]: array(['BLDG0', 'BLDG1', 'BLDG0'], dtype='<U5')
    In [247]: room_info['building']=='BLDG0'
    Out[247]: array([ True, False,  True])
    
    In [248]: room_info['room']==14
    Out[248]: array([False, False,  True])
    

    combine the two:

    In [249]: Out[247] & Out[248]
    Out[249]: array([False, False,  True])
    

    Use that as a boolean mask:

    In [250]: room_info[_]
    Out[250]: 
    array([('BLDG0', 14, [10, 20])],
          dtype=[('building', '<U5'), ('room', '<i8'), ('seating', '<i8', (2,))])
    

    and getting the index:

    In [251]: np.nonzero(Out[247]&Out[248])
    Out[251]: (array([2]),)
    

    Looks like we can test both fields, using a properly constructed structured array:

    In [254]: test=np.array(('BLDG0',14),dtype=my_dtype[:2])
    In [255]: room_info[['building','room']]
    Out[255]: 
    array([('BLDG0', 12), ('BLDG1', 34), ('BLDG0', 14)],
          dtype={'names':['building','room'], 'formats':['<U5','<i8'], 'offsets':[0,20], 'itemsize':44})
    In [256]: room_info[['building','room']]==test
    Out[256]: array([False, False,  True])