matlabcsvoctavetextscan

MATLAB / Octave - how to parse CSV file with numbers and strings that contain commas


I have a CSV file that has 20 columns. Some of the columns have number values, others have text values, and the text ones may or may not contain commas.

CSV content example:

column1, column2, column3, column4
"text value 1", 123, "text, with a comma", 25
"another, comma", 456, "other text", 78

I'm using textscan function, but I'm getting the most buggy and weird behavior. With some arguments, it reads all the values in only one column, sometimgs it repeats columns, and most of the things I've tried lead to the commas being incorrectly interpreted as column separators (despite text being enclosed in double quotes). That is, I've tried specifying 'delimiter' argument, and also including literals in the format specification, to no avail.

What's the correct way of invoking textscan to deal with a CSV file as the example above? I'm looking for a solution that runs both on MATLAB and on Octave (or, if that's not possible, the equivalent solution in each one).


Solution

  • For GNU Octave, using io package

    pkg load io
    c = csv2cell ("jota.csv")
    

    gives

    c = 
    {
      [1,1] = column1
      [2,1] = text value 1
      [3,1] = another, comma
      [1,2] =  column2
      [2,2] =  123
      [3,2] =  456
      [1,3] =  column3
      [2,3] =  text, with a comma
      [3,3] =  other text
      [1,4] =  column4
      [2,4] =  25
      [3,4] =  78
    }
    

    btw, you should explicitly mention if the solution should run on GNU Octave, Matlab or both