I have a text file with many thousands of rows which look like this
20120601 000000603,1.234610,1.234780,0
where the first two whitespace separated columns are a date and time representation and the following three comma separated columns are data. I want to read the text file into an Octave matrix such that the columns of the matrix are separated thus.
2012 06 01 00 00 00 603 1.234610 1.234780 0
I'm sure the textscan function is what I'll need to use, but I don't know the format string to separate things as I want.
You can use the function fscanf
(see
formatted input in the docs
in the docs as well as a few other following pages in the same chapter C-Style I/O Functions
for explanations about format) to read the data into a
matrix, that is to convert the integer values to floating point values.
The line format you describe by your example is probably (minor interpretive variations are possible, like for instance is the last entry one digit long or can it be any decimal):
"%4d%2d%2d %2d%2d%2d%3d,%f,%f,%d"
that reads:
%4d
- a four character decimal integer, followed by%2d
- a two character decimal integer,%2d
- a two character decimal integer,%2d
- a two character decimal integer,%2d
- a two character decimal integer,%2d
- a two character decimal integer,%3d
- a three character decimal integer,
a comma%f
- a floating point number (any number of characters),,
- a comma%f
- a floating point number (any number of characters),,
- a comma%d
- a decimal integer (any number of characters)Note that unlike the commas (and most other characters) spaces between fields are
ignored in both the format and the matched input text, so one could have used
"%4d%2d%2d%2d%2d%2d%3d,%f,%f,%d"
(without the space) format, or
"%4d %2d %2d %2d %2d %2d %3d,%f,%f,%d"
for better readability
If you have an test file input.txt
with the following content:
20120601 000000603,1.234610,1.234780,0
20120602 010203604,2.234610,2.234780,11
20120603 000000605,3.234610,3.234780,22
using
fileID = fopen('input.txt','r');
sizeM = [10, Inf];
M = fscanf(fileID, "%4d%2d%2d %2d%2d%2d%3d,%f,%f,%d", sizeM);
fclose(fileID);
([10, Inf]
means that the resulting matrix will have 10 rows and an unlimited
number of columns) will produce the matrix M
:
2.0120e+03 2.0120e+03 2.0120e+03
6.0000e+00 6.0000e+00 6.0000e+00
1.0000e+00 2.0000e+00 3.0000e+00
0 1.0000e+00 0
0 2.0000e+00 0
0 3.0000e+00 0
6.0300e+02 6.0400e+02 6.0500e+02
1.2346e+00 2.2346e+00 3.2346e+00
1.2348e+00 2.2348e+00 3.2348e+00
0 1.1000e+01 2.2000e+01
where each column contains the 10 values of the corresponding line in the
input text, converted to floating point numbers and in scientific notation (2.0120e+03
for 2012.0
). Of course, the matrix can be transposed if one wants to keep
input line values in a row.
The function textscan
produces a cell array, a heterogeneous structure
honouring the different types of the input - in this case integers and floating
point numbers, but there can also be strings for instance.
The format should be the same, so one has just to replace the fscanf
line above
with
M =textscan(fileID, "%4d%2d%2d %2d%2d%2d%3d,%f,%f,%d");