pythonnumpystructured-array

Structured numpy array with 2 different data types


I imported a csv file into a numpy array which I need to convert to a structured array with only the first column as dtype string and all the other 47 columns as float. How do I define data type for the other 47 columns in a single operation? Do I have to specify dtype column by column?

Thanks in advance


Solution

  • You can read the source file just as a structured array.

    Assume that you input file contains:

    So its content is:

    ABCD,160.72,180.21,260.13,451.48
    EFGH,252.42,132.21,150.11,612.56
    IJKL,541.77,455.21,268.76,543.81
    

    To read such a file you can use np.loadtxt method, passing dtype as a structured type (a list of definitions), which can be generated e.g. in a list comprehension:

    nFloats = 4
    a = np.loadtxt('Input.csv', delimiter=',',
        dtype=[('Id', 'U10')] + [( f'F{i+1}', '<f4' ) for i in range(nFloats)])
    

    Note that I passed U10 as the type of Id column (10 chars). If you need, set other size of this field.

    The result is:

    array([('ABCD', 160.72, 180.21, 260.13, 451.48),
           ('EFGH', 252.42, 132.21, 150.11, 612.56),
           ('IJKL', 541.77, 455.21, 268.76, 543.81)],
          dtype=[('Id', '<U10'), ('F1', '<f4'), ('F2', '<f4'), ('F3', '<f4'), ('F4', '<f4')])
    

    Of course, in your target version of code increase nFloats accordingly (probably it should be 47).