pythonnumpystructured-array

How to do columnwise operations with Numpy structured arrays?


This shows the problem nicely:

import numpy as np

a_type = np.dtype([("x", int), ("y", float)])
a_list = []

for i in range(0, 8, 2):
    entry = np.zeros((1,), dtype=a_type)
    entry["x"][0] = i
    entry["y"][0] = i + 1.0
    a_list.append(entry)
a_array = np.array(a_list, dtype=a_type)
a_array_flat = a_array.reshape(-1)
print(a_array_flat["x"])
print(np.sum(a_array_flat["x"]))

and this produces the trackback and output:

[0 2 4 6]
Traceback (most recent call last):
  File "/home/andreas/src/masiri/booking_algorythm/demo_structured_aarray_flatten.py", line 14, in <module>
    print(np.sum(a_array_flat["x"]))
  File "<__array_function__ internals>", line 180, in sum
  File "/home/andreas/src/masiri/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2298, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/home/andreas/src/masiri/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype({'names': ['x'], 'formats': ['<i8'], 'offsets': [0], 'itemsize': 16}), dtype({'names': ['x'], 'formats': ['<i8'], 'offsets': [0], 'itemsize': 16})) -> None

I chose this data structure because I must do many column-wise operations fast and have more esoteric types like timedelta64 and datetime64, too. I am sure basic Numpy operations work, and I overlook something obvious. Please help me.


Solution

  • In an ipython session, your code runs fine:

    In [2]: a_type = np.dtype([("x", int), ("y", float)])
       ...: a_list = []
       ...: 
       ...: for i in range(0, 8, 2):
       ...:     entry = np.zeros((1,), dtype=a_type)
       ...:     entry["x"][0] = i
       ...:     entry["y"][0] = i + 1.0
       ...:     a_list.append(entry)
       ...: a_array = np.array(a_list, dtype=a_type)
       ...: a_array_flat = a_array.reshape(-1)
    
    In [3]: a_list
    Out[3]: 
    [array([(0, 1.)], dtype=[('x', '<i4'), ('y', '<f8')]),
     array([(2, 3.)], dtype=[('x', '<i4'), ('y', '<f8')]),
     array([(4, 5.)], dtype=[('x', '<i4'), ('y', '<f8')]),
     array([(6, 7.)], dtype=[('x', '<i4'), ('y', '<f8')])]
    
    In [4]: a_array
    Out[4]: 
    array([[(0, 1.)],
           [(2, 3.)],
           [(4, 5.)],
           [(6, 7.)]], dtype=[('x', '<i4'), ('y', '<f8')])
    
    In [5]: a_array_flat
    Out[5]: 
    array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
          dtype=[('x', '<i4'), ('y', '<f8')])
    
    In [6]: a_array_flat['x']
    Out[6]: array([0, 2, 4, 6])
    
    In [7]: np.sum(a_array_flat["x"])
    Out[7]: 12
    

    The error message almost looks like you are indexing with field list:

    In [8]: np.sum(a_array_flat[["x"]])
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    Input In [8], in <cell line: 1>()
    ----> 1 np.sum(a_array_flat[["x"]])
    
    File <__array_function__ internals>:5, in sum(*args, **kwargs)
    
    ...
    TypeError: cannot perform reduce with flexible type
    
    In [9]: a_array_flat[["x"]]
    Out[9]: 
    array([(0,), (2,), (4,), (6,)],
          dtype={'names':['x'], 'formats':['<i4'], 'offsets':[0], 'itemsize':12})
    

    What numpy version are you using? There was a period where numpy versions flipped-flopped on how they handled views of the array.

    Doing the sum on the unflattened array:

    In [11]: a_array["x"]
    Out[11]: 
    array([[0],
           [2],
           [4],
           [6]])
    
    In [12]: a_array["x"].sum()
    Out[12]: 12
    

    Another way of constructing this array:

    In [15]: import numpy.lib.recfunctions as rf
    In [16]: arr = np.arange(8).reshape(4,2);arr
    Out[16]: 
    array([[0, 1],
           [2, 3],
           [4, 5],
           [6, 7]])
    
    In [17]: arr1 = rf.unstructured_to_structured(arr, dtype=a_type)    
    In [18]: arr1
    Out[18]: 
    array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
          dtype=[('x', '<i4'), ('y', '<f8')])
    
    In [19]: arr1['x']
    Out[19]: array([0, 2, 4, 6])
    

    or:

    In [20]: arr2 = np.zeros(4, a_type)
    In [21]: arr2['x']=arr[:,0]; arr2['y']=arr[:,1]
    In [22]: arr2
    Out[22]: 
    array([(0, 1.), (2, 3.), (4, 5.), (6, 7.)],
          dtype=[('x', '<i4'), ('y', '<f8')])
    

    edit

    I get your error message with the python sum (as opposed to np.sum, which I showed above).

    In [26]: sum(a_array[['x']])
    ---------------------------------------------------------------------------
    UFuncTypeError                            Traceback (most recent call last)
    Input In [26], in <cell line: 1>()
    ----> 1 sum(a_array[['x']])
    
    UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int32'), dtype({'names':['x'], 'formats':['<i4'], 'offsets':[0], 'itemsize':12})) -> None