I'm trying to create a nested record array, but I am having trouble with the dimensions. I tried following the example at how to set dtype for nested numpy ndarray?, but I am misunderstanding something. Below is an MRE. The arrays are generated in a script, not imported from CSV.
arr1 = np.array([4, 5, 4, 5])
arr2 = np.array([0, 0, -1, -1])
arr3 = np.array([0.51, 0.89, 0.59, 0.94])
arr4 = np.array(
[[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
).T
arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
arrs = (arr1, arr2, arr3, arr4, arr5)
for i in arrs:
print(i.shape, i)
For which the print statement returns:
(4,) [4 5 4 5]
(4,) [ 0 0 -1 -1]
(4,) [0.51 0.89 0.59 0.94]
(4, 3) [[0.52 0.41 0.68]
[0.8 0.71 1.12]
[0.62 0.46 0.78]
[1.1 0.77 1.19]]
(4, 3) [[0.6 0.2 0.2]
[0.6 0.2 0.2]
[0.6 0.2 0.2]
[0.6 0.2 0.2]]
However, the ans
line throws an error:
dtypes = [
("state", "f8"),
("variability", "f8"),
("target", "f8"),
("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
]
ans = np.column_stack(arrs).view(dtype=dtypes)
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
Problem 1: How do I get the desired array output?
print(np.column_stack(arrs))
returns
[[ 4. 0. 0.51 0.52 0.41 0.68 0.6 0.2 0.2 ]
[ 5. 0. 0.89 0.8 0.71 1.12 0.6 0.2 0.2 ]
[ 4. -1. 0.59 0.62 0.46 0.78 0.6 0.2 0.2 ]
[ 5. -1. 0.94 1.1 0.77 1.19 0.6 0.2 0.2 ]]
But the desired output looks like this:
[[4 0 0.51 (0.52, 0.41, 0.68) (0.6, 0.2, 0.2)]
[5 -1 0.89 (0.8, 0.71, 1.12) (0.6, 0.2, 0.2)]
[4 0 0.59 (0.62, 0.46, 0.78) (0.6, 0.2, 0.2)]
[5 -1 0.94 (1.1, 0.77, 1.19) (0.6, 0.2, 0.2)]]
Problem 2: How do I include the dtype.names?
print(rec_array.dtype.names)
should return:
('state', 'variability', 'target', 'measured', 'var')
and print(rec_array['measured'].dtype.names)
should return:
('mean', 'low', 'high')
and similarly for the names of the other nested array.
With your dtype:
In [2]: dtypes = [
...: ("state", "f8"),
...: ("variability", "f8"),
...: ("target", "f8"),
...: ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
...: ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")], (4,)),
...: ]
A 2 element zeros array looks like:
In [3]: arr = np.zeros(2,dtypes)
In [4]: arr
Out[4]:
array([(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)]),
(0., 0., 0., [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)], [(0., 0., 0.), (0., 0., 0.), (0., 0., 0.), (0., 0., 0.)])],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
Using recfunctions
I can map that to a unstructured array:
In [5]: import numpy.lib.recfunctions as rf
In [6]: uarr = rf.structured_to_unstructured(arr)
In [7]: uarr
Out[7]:
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
In [8]: uarr.shape
Out[8]: (2, 27)
That says that your dtypes has 27 fields, not the 9 that seem to think (from your column stack).
Making a new (2,27) array, I can create a structured array:
In [9]: uarr = np.arange(2*27).reshape(2,27)
In [18]: rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
Out[18]:
array([( 0., 1., 2., [( 3., 4., 5.), ( 6., 7., 8.), ( 9., 10., 11.), (12., 13., 14.)], [(15., 16., 17.), (18., 19., 20.), (21., 22., 23.), (24., 25., 26.)]),
(27., 28., 29., [(30., 31., 32.), (33., 34., 35.), (36., 37., 38.), (39., 40., 41.)], [(42., 43., 44.), (45., 46., 47.), (48., 49., 50.), (51., 52., 53.)])],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,)), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')], (4,))])
view
still has problems with this. In some simple cases view
does work, though it can require some dimensions adjustment. But I have not explored its limitations:
In [19]: uarr.view(np.dtype(dtypes))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 uarr.view(np.dtype(dtypes))
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
removing the (4,) from dtypes:
In [35]: dtypes = [
...: ("state", "f8"),
...: ("variability", "f8"),
...: ("target", "f8"),
...: ("measured", [("mean", "f8"), ("low", "f8"), ("hi", "f8")]),
...: ("var", [("mid", "f8"), ("low", "f8"), ("hi", "f8")]),
...: ]
In [36]: arr = np.zeros(2,dtypes)
In [37]: arr
Out[37]:
array([(0., 0., 0., (0., 0., 0.), (0., 0., 0.)),
(0., 0., 0., (0., 0., 0.), (0., 0., 0.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
In [38]: uarr = np.arange(18).reshape(2,9)
In [39]: arr1 = rf.unstructured_to_structured(uarr, dtype=np.dtype(dtypes))
In [40]: arr1
Out[40]:
array([(0., 1., 2., ( 3., 4., 5.), ( 6., 7., 8.)),
(9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
In [43]: arr1['measured']
Out[43]:
array([( 3., 4., 5.), (12., 13., 14.)],
dtype=[('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')])
In [44]: arr1['measured']['mean']
Out[44]: array([ 3., 12.])
and via a csv and genfromtxt
In [45]: np.savetxt('foo', uarr)
In [46]: more foo
0.000000000000000000e+00 1.000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00 4.000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00 7.000000000000000000e+00 8.000000000000000000e+00
9.000000000000000000e+00 1.000000000000000000e+01 1.100000000000000000e+01 1.200000000000000000e+01 1.300000000000000000e+01 1.400000000000000000e+01 1.500000000000000000e+01 1.600000000000000000e+01 1.700000000000000000e+01
In [47]: data = np.genfromtxt('foo', dtype=dtypes)
In [48]: data
Out[48]:
array([(0., 1., 2., ( 3., 4., 5.), ( 6., 7., 8.)),
(9., 10., 11., (12., 13., 14.), (15., 16., 17.))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])
view
still does not work.
In [50]: arr1 = np.array([4, 5, 4, 5])
...: arr2 = np.array([0, 0, -1, -1])
...: arr3 = np.array([0.51, 0.89, 0.59, 0.94])
...: arr4 = np.array(
...: [[0.52, 0.80, 0.62, 1.1], [0.41, 0.71, 0.46, 0.77], [0.68, 1.12, 0.78, 1.19]]
...: ).T
...: arr5 = np.repeat(np.array([0.6, 0.2, 0.2]), 4).reshape(3, 4).T
...: arrs = (arr1, arr2, arr3, arr4, arr5)
In [51]: ans = np.column_stack(arrs)
In [52]: ans
Out[52]:
array([[ 4. , 0. , 0.51, 0.52, 0.41, 0.68, 0.6 , 0.2 , 0.2 ],
[ 5. , 0. , 0.89, 0.8 , 0.71, 1.12, 0.6 , 0.2 , 0.2 ],
[ 4. , -1. , 0.59, 0.62, 0.46, 0.78, 0.6 , 0.2 , 0.2 ],
[ 5. , -1. , 0.94, 1.1 , 0.77, 1.19, 0.6 , 0.2 , 0.2 ]])
In [53]: arr2 = rf.unstructured_to_structured(ans, dtype=np.dtype(dtypes))
In [54]: arr2
Out[54]:
array([(4., 0., 0.51, (0.52, 0.41, 0.68), (0.6, 0.2, 0.2)),
(5., 0., 0.89, (0.8 , 0.71, 1.12), (0.6, 0.2, 0.2)),
(4., -1., 0.59, (0.62, 0.46, 0.78), (0.6, 0.2, 0.2)),
(5., -1., 0.94, (1.1 , 0.77, 1.19), (0.6, 0.2, 0.2))],
dtype=[('state', '<f8'), ('variability', '<f8'), ('target', '<f8'), ('measured', [('mean', '<f8'), ('low', '<f8'), ('hi', '<f8')]), ('var', [('mid', '<f8'), ('low', '<f8'), ('hi', '<f8')])])