pythonnumpycsvgenfromtxt

Use np.genfromtxt to read data of different dtypes in csv file


I am trying to read a csv file of that looks like:

label,value
first,1.234e-01
second,5.678e-02
three,9.876e-03
...

etc

Where the first column contains strings and the second column contains floats.

From the online documentation of np.genfromtxt I thought that the line

file_data = np.genfromtxt(filepath, dtype=[('label','<U'),('value','<f4')], delimiter=',', skip_header=1)

would specify the dtype of each column which would allow it to be read appropriately but when I try to print file_data I get something that looks like

[('', 1.234e-01) ('', 5.678e-02) ('', 9.876e-03) ...]

when I was expecting

[('first', 1.234e-01) ('second', 5.678e-02) ('third', 9.876e-03) ...]

Solution

  • You need to specify an approximate expected number of unicode chars in dtype (like <U10):

    from io import StringIO
    
    data = '''label,value
    first,1.234e-01
    second,5.678e-02
    three,9.876e-03'''
    
    file_data = np.genfromtxt(StringIO(data), dtype=[('label','<U15'),('value','<f4')], delimiter=',', skip_header=1)
    print(file_data)
    

    [('first', 0.1234  ) ('second', 0.05678 ) ('three', 0.009876)]