pythonnumpystructured-array

Referring to a field in a numpy structured array is the same size as the entire array


Issue

I have a numpy structured array and I want to take two small fields. When I do, I get an item that is as large as the original

Example

>>> A = np.zeros(100, dtype=[('f',float),('x',float,2),('large',float,500000)])
>>> A.itemsize
4000024
>>> A['f'].itemsize
8
>>> A['x'].itemsize
8
>>> A[['x','f']].itemsize
4000024
>>> A[['x']].itemsize
4000024

Question

Why does taking a slice of fields in a numpy array produce an array that is as large as the original? (I'm using python3.8 and numpy version 1.18.3)


Solution

  • The numpy function that is needed is repack_fields. The example then becomes:

    >>> from numpy.lib.recfunctions import repack_fields
    >>> A = np.zeros(100, dtype=[('f',float),('x',float,2),('large',float,500000)])
    >>> A[['x']].itemsize
    4000024
    >>> repack_fields(A[['x']]).itemsize
    16
    

    Note that repacking the fields of A will necessarily use more memory. This may be desired, for example when using mpi4py to communicate A[['x']] between ranks (and all of A is too large to communicate).