I am trying to read a CSV file in Octave with textscan
and the CSV file isn't always correctly formatted. The following mcve should illustrate the issue:
Let's say the file is as follows:
12/01/2020,12,1,2020,0,0,Russia,RU,RUS,145872260,Europe,0
11/01/2020,11,1,2020,0,0,Russia,RU,RUS,145872260,Europe,0
10/01/2020,10,1,2020,0,0,Russia,RU,RUS,145872260,Europe,0
09/01/2020,9,1,2020,0,0,Russia,RU,RUS,145872260,Europe,0
08/01/2020,8,1,2020,0,0,Russia,RU,RUS,145872260,Europe,0
07/01/2020,7,1,2020,0,0,Russia,RU,RUS,145872260,Europe,
06/01/2020,6,1,2020,0,0,Russia,RU,RUS,145872260,Europe,
05/01/2020,5,1,2020,0,0,Russia,RU,RUS,145872260,Europe,
You will notice that the final 0 is missing in the last 3 lines. Obviously, I can go in and manually edit the CSV files in Notepad++ or similar, but we're talking several tens of thousands of lines to go through and I just feel there must be a better solution.
My code would be something like this (note that I have tried using %*f
for the last element to tell Octave to skip it but that doesn't seem to work):
fname = 'mcve.csv'; % the above file
fid = fopen(fname);
csv_data = textscan(fid,'%s %d %d %d %d %d %s %s %s %d %s %*f','Delimiter',',');
fclose(fid);
If you then look at csv_data
, you will see that the dates are not correct (the rest of the data looks OK):
>> csv_data{1}
ans =
{
[1,1] = 12/01/2020
[2,1] = 11/01/2020
[3,1] = 09/01/2020
[4,1] = 08/01/2020
[5,1] = 07/01/2020
[6,1] = /01/2020
[7,1] = /01/2020
}
Any idea on how to solve this or what else to try other than the %*f
I already tried?
Use csv2cell
from the io
package.