The data I'm working with can be found at this gist,
And looks like:
07-11-2018 18:34:35 -2.001 5571.036 -1.987
07-11-2018 18:34:50 -1.999 5570.916 -1.988
image of code and output in Jupyter Notebook
When calling
TB_CAL_array = np.genfromtxt('calbath_data/TB118192.TXT',
skip_header = 10,
dtype = ([("date", "<U10"), ("time","<U8"), ("bathtemp", "<f8"),
("SBEfreq", "<f8"), ("SBEtemp", "<f8")])
)
Output of array is:
array([('07-11-2018', '18:34:35', -2.001e+00, 5571.036, -1.987),
('07-11-2018', '18:34:50', -1.999e+00, 5570.916, -1.988),
The data is output as a structured ndarray of tuples and is a non-homogenous array because it contains both strings and floats. numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why?
NOTE: The third column of data output has been treated as something other than the dtype specified.
The output should be -2.001
but instead it is -2.001e+00
NOTE: Notice that the fifth column has the same input format and dtype designation, however no data transformation occurred there during the genfromtxt function...
The only difference I can find between "bathtemp" and "SBEtemp" is that there are two extra blank spaces after the "bathtemp" column...
However based on the numpy.genfromtxt IO documentation this shouldn't matter because consecutive whitespace should automatically be treated as a delimiter.:
delimiter : str, int, or sequence, optional The string used to separate values. By default, any consecutive whitespaces act as delimiter. An integer or sequence of integers can also be provided as width(s) of each field.
Is the extra whitespace after the "bathtemp" column causing the error? If so how do I work around it?
With your sample:
In [136]: txt="""07-11-2018 18:34:35 -2.001 5571.036 -1.987
...: 07-11-2018 18:34:50 -1.999 5570.916 -1.988"""
In [137]: np.genfromtxt(txt.splitlines(), dtype=None, encoding=None)
Out[137]:
array([('07-11-2018', '18:34:35', -2.001, 5571.036, -1.987),
('07-11-2018', '18:34:50', -1.999, 5570.916, -1.988)],
dtype=[('f0', '<U10'), ('f1', '<U8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
and with your dtype:
In [139]: np.genfromtxt(txt.splitlines(), dtype= ([("date", "<U10"), ("time","<U
...: 8"), ("bathtemp", "<f8"),
...: ("SBEfreq", "<f8"), ("SBEtemp", "<
...: f8")])
...: , encoding=None)
Out[139]:
array([('07-11-2018', '18:34:35', -2.001, 5571.036, -1.987),
('07-11-2018', '18:34:50', -1.999, 5570.916, -1.988)],
dtype=[('date', '<U10'), ('time', '<U8'), ('bathtemp', '<f8'), ('SBEfreq', '<f8'), ('SBEtemp', '<f8')])
Values like -2.001e+00
are the same as -2.001
. numpy
chooses to use scientific notation when the range of values is wide enough, or some values are too small to show well otherwise.
For example, if I change one of the values to something much smaller:
In [140]: data = _
In [141]: data['bathtemp']
Out[141]: array([-2.001, -1.999])
In [142]: data['bathtemp'][1] *= 0.001
In [143]: data['bathtemp']
Out[143]: array([-2.001e+00, -1.999e-03])
The -2.001
is unchanged (except display style).
My guess is that some of the bathtemp
values (that you don't show) are much closer to zero.