pythonlistnumpyfieldstructured-array

Given that you added a new field to a 1-d slice of a structured array, why can you not set the entry of the new field to a list?


The title may be a little bit confusing, so I hope I can make it clearer with the help of an example. Image I have a little helper function that adds new fields to already existing structured arrays:

import numpy as np


def add_field(a, *descr):
    b = np.empty(a.shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b

Given a structured array, I can simply use it to add new fields:

a = np.array(
    [(1, False), (2, False), (3, False), (4, True)],
    dtype=[('id', 'i4'), ('used', '?')]
)
print(a)
b = add_field(a, ('new', 'O'))
print(b)

I can then set an entry of the newly created field to an (empty) list without a problem:

b[0]['new'] = []

I can also create a new array which is only a slice of the original one and then add a new field to this new array:

c = a[0]
print(c)
d = add_field(c, ('newer', 'O'))
print(d)

BUT if I now try to set the new field to an (empty) list, it doesn't work:

d['newer'] = []

ValueError: assignment to 0-d array

Why is that? According to add_field, d is an entirely new array that happens to share the same fields and entries just like b did. Interestingly, the shape of b[0] is (), while the shape of d is (1,) (and also type(b) is np.void while type(d) is np.array). Maybe that has something to do with it? Also interestingly, all of this works:

d['newer'] = 1.34
d['newer'] = False
d['newer'] = None
d['newer'] = add_field
d['newer'] = set()
d['newer'] = {}
d['newer'] = {'test': []}

However, accessing the vaues in the last dict using the key 'test' does not:

>>> d['newer'] = {'test': []}
>>> d['newer']
>>> array({'test': []}, dtype=object)
>>> d['newer']['test']
>>> IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
>>> d['newer'][0]
>>> IndexError: too many indices for array

This is very confusing.

EDIT

Okay, I just tried to modify the add_field function like this:

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b

But this didn't help:

>>> d = add_field(a[0], ('newer', 'O'))
>>> d
>>> array([(1, False, None)], dtype=[('id', '<i4'), ('used', '?'), ('test', 'O')])
>>> d.shape
>>> (1,)
>>> d['newer'] = []
>>> ValueError: cannot copy sequence with size 0 to array axis with dimension 1

So this was not it I guess. However this now works:

>>> d['newer'][0] = []

But I don't like this workaround. I would expect it to work the same as for b[0].

EDIT 2

If I modify the add_field function a little bit further, I can force the wanted behaviour, although I don't 100% like it:

def add_field(a, *descr):
    shape = a.shape if len(a.shape) else (1,)
    b = np.empty(shape, dtype=a.dtype.descr + [*descr])
    for name in a.dtype.names:
        b[name] = a[name]
    return b if len(a.shape) else b[0]

d = add_field(a[0], ('newer', 'O'))
d['newer'] = []

Solution

  • To summarize the comments:

    The issue in the original question appears to be the shape of the returned object - when you do e.g.

    c = a[0]
    

    with a having shape (n,) you are not taking a slice from the array but a single element. c.shape then is (). When you pass an array of shape () into add_field then the new array created by

    b = np.empty(a.shape, dtype=a.dtype.descr + [*descr])
    

    will also have shape (). However, it is necessary for a structured array to have shape (n,) (though it is not outlined in the documentation).

    As in the first edit to the question, the correct modification would be

    def add_field(a, *descr):
        shape = a.shape if len(a.shape) else (1,)
        b = np.empty(shape, dtype=a.dtype.descr + [*descr])
        b[list(a.dtype.names)] = a
        return b
    

    The returned object will then share the properties of a shape (n,) structured array in that:

    1. If you index the array at an integer position you get a structure (e.g. d[0])
    2. You can access and modify individual fields of a structured array by indexing with the field name (e.g. d['newer'])

    With the above modification the behavior of d in the question is the same as b e.g.

    d[0]['newer'] = []
    

    is valid, as is

    b[0]['new'] = []
    

    This brings us to the real crux of the question:


    Why can't we assign an empty list to each element of a field using the d['newer']=[] syntax?

        When you assign an iterable instead of a scalar using this syntax, numpy attempts an element-wise assignment (or a broadcast depending on the iterable). This differs from the assignment of a scalar wherein the scalar is assigned to every element of that field. The documentation is not clear on this point, but we can get a much more helpful error by using

    b['new'] = np.array([])
    
    Traceback (most recent call last):
      File "structuredArray.py", line 20, in <module>
        b['new'] = np.array([])
    ValueError: could not broadcast input array from shape (0) into shape (4)
    

    So the issue here isn't how the field is being added, but how you are attempting to assign an empty list to each element of that field. The correct way to do this would be something like

    b['new'] = [[]*b.shape[0]]
    

    which works as expected for structured arrays of both (1,) and (4,) shape:

    import numpy as np
    
    def add_field(a, *descr):
        shape = a.shape if len(a.shape) else (1,)
        b = np.empty(shape, dtype=a.dtype.descr + [*descr])
        for name in a.dtype.names:
            b[name] = a[name]
        return b
    
    a = np.array(
        [(1, False), (2, False), (3, False), (4, True)],
        dtype=[('id', 'i4'), ('used', '?')]
    )
    
    b = add_field(a, ('new', 'O'))
    b['new'] = [[]*b.shape[0]]
    print(b)
    
    c = a[0]
    d = add_field(c, ('newer', 'O'))
    d['newer'] = [[]*d.shape[0]]
    print(d)
    
    [(1, False, list([])) (2, False, list([])) (3, False, list([])) (4,  True, list([]))]
    [(1, False, list([]))]