pythonnumpystructured-array

Column stacking nested numpy structure array, help getting dims right


I'm trying to create a nested record array, but I am having trouble with the dimensions. I tried following the example at how to set dtype for nested numpy ndarray?, but I am misunderstanding something. Below is an MRE. The arrays are generated in a script, not imported from CSV.

arr1 = np.array([4, 5, 4, 5])
arr2 = np.array([0, 0, -1, -1])
arr3 = np.array([0.51, 0.89, 0.59, 0.94])
arr4 = np.array(
    [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
).T
arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
arrs = (arr1, arr2, arr3, arr4, arr5)

for i in arrs:
    print(i.shape, i)

For which the print statement returns:

(4,) [4 5 4 5]
(4,) [ 0  0 -1 -1]
(4,) [0.51 0.89 0.59 0.94]
(4, 3) [[0.52 0.41 0.68]
 [0.8  0.71 1.12]
 [0.62 0.46 0.78]
 [1.1  0.77 1.19]]
(4, 3) [[0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]
 [0.6 0.2 0.2]]

However, the ans line throws an error:

dtypes = [
        ("state", "f8"),
        ("variability", "f8"),
        ("target", "f8"),
        ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
        ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
]
ans = np.column_stack(arrs).view(dtype=dtypes)

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

Problem 1: How do I get the desired array output? print(np.column_stack(arrs)) returns

[[ 4.    0.    0.51  0.52  0.41  0.68  0.6   0.2   0.2 ]
 [ 5.    0.    0.89  0.8   0.71  1.12  0.6   0.2   0.2 ]
 [ 4.   -1.    0.59  0.62  0.46  0.78  0.6   0.2   0.2 ]
 [ 5.   -1.    0.94  1.1   0.77  1.19  0.6   0.2   0.2 ]]

But the desired output looks like this:

[[4 0 0.51 (0.52, 0.41, 0.68) (0.6, 0.2, 0.2)]
 [5 -1 0.89 (0.8, 0.71, 1.12) (0.6, 0.2, 0.2)]
 [4 0 0.59 (0.62, 0.46, 0.78) (0.6, 0.2, 0.2)]
 [5 -1 0.94 (1.1, 0.77, 1.19) (0.6, 0.2, 0.2)]]

Problem 2: How do I include the dtype.names?

print(rec_array.dtype.names) should return: ('state', 'variability', 'target', 'measured', 'var')

and print(rec_array['measured'].dtype.names) should return: ('mean', 'low', 'high')

and similarly for the names of the other nested array.


Solution

  • With your dtype:

    In [2]: dtypes = [
       ...:         ("state", "f8"),
       ...:         ("variability", "f8"),
       ...:         ("target", "f8"),
       ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
       ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
       ...: ]
    

    A 2 element zeros array looks like:

    In [3]: arr = np.zeros(2,dtypes)    
    In [4]: arr
    Out[4]: 
    array([(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)]),
           (0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)])],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
    

    Using recfunctions I can map that to a unstructured array:

    In [5]: import numpy.lib.recfunctions as rf    
    In [6]: uarr = rf.structured_to_unstructured(arr)    
    In [7]: uarr
    Out[7]: 
    array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
            0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
            0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])    
    In [8]: uarr.shape
    Out[8]: (2, 27)
    

    That says that your dtypes has 27 fields, not the 9 that seem to think (from your column stack).

    Making a new (2,27) array, I can create a structured array:

    In [9]: uarr = np.arange(2*27).reshape(2,27)
    In [18]: rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
    Out[18]: 
    array([( 0.,  1.,  2., [( 3.,  4.,  5.), ( 6.,  7.,  8.), ( 9., 10., 11.), (12., 13., 14.)], [(15., 16., 17.), (18., 19., 20.), (21., 22., 23.), (24., 25., 26.)]),
           (27., 28., 29., [(30., 31., 32.), (33., 34., 35.), (36., 37., 38.), (39., 40., 41.)], [(42., 43., 44.), (45., 46., 47.), (48., 49., 50.), (51., 52., 53.)])],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
    

    view still has problems with this. In some simple cases view does work, though it can require some dimensions adjustment. But I have not explored its limitations:

    In [19]: uarr.view(np.dtype(dtypes))
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Input In [19], in <cell line: 1>()
    ----> 1 uarr.view(np.dtype(dtypes))
    
    ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
    

    edit

    removing the (4,) from dtypes:

    In [35]: dtypes = [
        ...:         ("state", "f8"),
        ...:         ("variability", "f8"),
        ...:         ("target", "f8"),
        ...:         ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")]),
        ...:         ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")]),
        ...: ]
    
    In [36]: arr = np.zeros(2,dtypes)
    
    In [37]: arr
    Out[37]: 
    array([(0., 0., 0., (0., 0., 0.), (0., 0., 0.)),
           (0., 0., 0., (0., 0., 0.), (0., 0., 0.))],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
    
    In [38]: uarr = np.arange(18).reshape(2,9)
    
    In [39]: arr1 = rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
    
    In [40]: arr1
    Out[40]: 
    array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
           (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
    
    In [43]: arr1['measured']
    Out[43]: 
    array([( 3.,  4.,  5.), (12., 13., 14.)],
          dtype=[('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')])
    
    In [44]: arr1['measured']['mean']
    Out[44]: array([ 3., 12.])
    

    and via a csv and genfromtxt

    In [45]: np.savetxt('foo', uarr)
    
    In [46]: more foo
    0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00
    9.000000000000000000e+00 1.000000000000000000e+01 1.100000000000000000e+01 1.200000000000000000e+01 1.300000000000000000e+01 1.400000000000000000e+01 1.500000000000000000e+01 1.600000000000000000e+01 1.700000000000000000e+01
    
    In [47]: data = np.genfromtxt('foo', dtype=dtypes)
    
    In [48]: data
    Out[48]: 
    array([(0.,  1.,  2., ( 3.,  4.,  5.), ( 6.,  7.,  8.)),
           (9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
    

    view still does not work.

    with your data

    In [50]: arr1 = np.array([4, 5, 4, 5])
        ...: arr2 = np.array([0, 0, -1, -1])
        ...: arr3 = np.array([0.51, 0.89, 0.59, 0.94])
        ...: arr4 = np.array(
        ...:     [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
        ...: ).T
        ...: arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
        ...: arrs = (arr1, arr2, arr3, arr4, arr5)
    
    In [51]: ans = np.column_stack(arrs)
    
    In [52]: ans
    Out[52]: 
    array([[ 4.  ,  0.  ,  0.51,  0.52,  0.41,  0.68,  0.6 ,  0.2 ,  0.2 ],
           [ 5.  ,  0.  ,  0.89,  0.8 ,  0.71,  1.12,  0.6 ,  0.2 ,  0.2 ],
           [ 4.  , -1.  ,  0.59,  0.62,  0.46,  0.78,  0.6 ,  0.2 ,  0.2 ],
           [ 5.  , -1.  ,  0.94,  1.1 ,  0.77,  1.19,  0.6 ,  0.2 ,  0.2 ]])
    
    In [53]: arr2 = rf.unstructured_to_structured(ans, dtype=np.dtype(dtypes))
    
    In [54]: arr2
    Out[54]: 
    array([(4.,  0., 0.51, (0.52, 0.41, 0.68), (0.6, 0.2, 0.2)),
           (5.,  0., 0.89, (0.8 , 0.71, 1.12), (0.6, 0.2, 0.2)),
           (4., -1., 0.59, (0.62, 0.46, 0.78), (0.6, 0.2, 0.2)),
           (5., -1., 0.94, (1.1 , 0.77, 1.19), (0.6, 0.2, 0.2))],
          dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])