I imported a csv file into a numpy array which I need to convert to a structured array with only the first column as dtype string and all the other 47 columns as float. How do I define data type for the other 47 columns in a single operation? Do I have to specify dtype column by column?
Thanks in advance
You can read the source file just as a structured array.
Assume that you input file contains:
So its content is:
ABCD,160.72,180.21,260.13,451.48
EFGH,252.42,132.21,150.11,612.56
IJKL,541.77,455.21,268.76,543.81
To read such a file you can use np.loadtxt method, passing dtype as a structured type (a list of definitions), which can be generated e.g. in a list comprehension:
nFloats = 4
a = np.loadtxt('Input.csv', delimiter=',',
dtype=[('Id', 'U10')] + [( f'F{i+1}', '<f4' ) for i in range(nFloats)])
Note that I passed U10 as the type of Id column (10 chars). If you need, set other size of this field.
The result is:
array([('ABCD', 160.72, 180.21, 260.13, 451.48),
('EFGH', 252.42, 132.21, 150.11, 612.56),
('IJKL', 541.77, 455.21, 268.76, 543.81)],
dtype=[('Id', '<U10'), ('F1', '<f4'), ('F2', '<f4'), ('F3', '<f4'), ('F4', '<f4')])
Of course, in your target version of code increase nFloats accordingly (probably it should be 47).