[SOLVED] Structured numpy array with 2 different data types

Structured numpy array with 2 different data types

I imported a csv file into a numpy array which I need to convert to a structured array with only the first column as dtype string and all the other 47 columns as float. How do I define data type for the other 47 columns in a single operation? Do I have to specify dtype column by column?

Thanks in advance

Solution

You can read the source file just as a structured array.

Assume that you input file contains:

one string field, to be named as Id,
just four float fields, to be named as F1, F2 and so on.

So its content is:

ABCD,160.72,180.21,260.13,451.48
EFGH,252.42,132.21,150.11,612.56
IJKL,541.77,455.21,268.76,543.81

To read such a file you can use np.loadtxt method, passing dtype as a structured type (a list of definitions), which can be generated e.g. in a list comprehension:

nFloats = 4
a = np.loadtxt('Input.csv', delimiter=',',
    dtype=[('Id', 'U10')] + [( f'F{i+1}', '<f4' ) for i in range(nFloats)])

Note that I passed U10 as the type of Id column (10 chars). If you need, set other size of this field.

The result is:

array([('ABCD', 160.72, 180.21, 260.13, 451.48),
       ('EFGH', 252.42, 132.21, 150.11, 612.56),
       ('IJKL', 541.77, 455.21, 268.76, 543.81)],
      dtype=[('Id', '<U10'), ('F1', '<f4'), ('F2', '<f4'), ('F3', '<f4'), ('F4', '<f4')])

Of course, in your target version of code increase nFloats accordingly (probably it should be 47).