>>> from io import StringIO
>>> import numpy as np
>>> s = StringIO("1,1.3,abcde")
>>> data = np.genfromtxt(
... s,
... dtype=[('myint','i8'), ('myfloat','f8'), ('mystring','S5')],
... delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])
I am unable to understand what dtype="i8,f8,|S5"
stands for.
I can make out that i is an integer, f is the float and s is the string but what is 8 in i8? I first understood it for bytes but how can then s5 be possible?
I understand that dtype
helps to specify the data type so that we can read from CSV file but can someone give some insight on data types
The 8 in i8
or f8
is the number of bytes. There are several different ways to express the same datatype in numpy. The strings you see from np.genfromtxt
are in the compact format. The <
or >
sign in front mean little or big endian (see documentation), followed by i
for integer or f
for float/double, and the number of bytes.
The longer datatype names have the size in bits instead of bytes, meaning that i8
is int64
, f4
is float32
and so on. E.g.:
>>> np.dtype('i8')
dtype('int64')
>>> np.dtype('f4')
dtype('float32')
By default these are all little endian. If you want big endian, as far as I know, np.dtype
does not return the long form:
>>> np.dtype('>c16')
dtype('>c16')
Strings are a special datatype, and the number means the maximum number of string characters. See this question for more details.