matlabmatlab-table

Have Matlab's "readtable" recognize column headings?


I have the following CSV file with column headings on line 1:

Test.csv
--------
Prj  , Cap
A    ,  1
A    ,  2
H    ,  4
H    ,  5

I tried to read this into a table, but I'm having trouble making readtable recognize the column headings on line 1:

readtable( 'Test.csv' , ...
           delimitedTextImportOptions( 'VariableNamesLine' , 1 ) )

Var1     ExtraVar1
_____    _________
'Prj'     ' Cap'  
'A'       '  1'   
'A'       '  2'   
'H'       '  4'   
'H'       '  5'   

What am I misunderstanding about the VariableNamesLine parameter?

I am using Matlab 2019a. doc delimitedTextImportOptions shows it as being introduced in Matlab 2016b, and I am running Matlab 2019a.

Troubleshooting steps

Here is the delimitedTextImportOptions object:

dtio = delimitedTextImportOptions( 'VariableNamesLine' , 1)

     DelimitedTextImportOptions with properties:
      Format Properties:
                       Delimiter: {','}
                      Whitespace: '\b\t '
                      LineEnding: {'\n'  '\r'  '\r\n'}
                    CommentStyle: {}
       ConsecutiveDelimitersRule: 'split'
           LeadingDelimitersRule: 'keep'
                   EmptyLineRule: 'skip'
                        Encoding: 'system'
      Replacement Properties:
                     MissingRule: 'fill'
                 ImportErrorRule: 'fill'
                ExtraColumnsRule: 'addvars'
      Variable Import Properties: Set types by name using setvartype
                   VariableNames: {'Var1'}
                   VariableTypes: {'char'}
           SelectedVariableNames: {'Var1'}
                 VariableOptions: Show all 1 VariableOptions
      Access VariableOptions sub-properties using setvaropts/getvaropts
      Location Properties:
                       DataLines: [1 Inf]
               VariableNamesLine: 1
                  RowNamesColumn: 0
               VariableUnitsLine: 0
        VariableDescriptionsLine: 0

If I specify ReadVariableNames as true, only the first column heading is recognized. And it still gets repeated in the data.

readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )

     Prj     ExtraVar1
    _____    _________
    'Prj'     ' Cap'
    'A'       '  1'
    'A'       '  2'
    'H'       '  4'
    'H'       '  5'

I can avoid having headings read as data by explicitly specifying DataLines, but the 2nd column heading is still unread.

dtio = delimitedTextImportOptions( ...
  'VariableNamesLine' , 1 , ...
  'DataLines' , [2 Inf] );
readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )

Prj    ExtraVar1
___    _________
'A'      '  1'
'A'      '  2'
'H'      '  4'
'H'      '  5'

Oddly, the DataLines specification is ignored if I additionally unset any preconceived VariableNames:

dtio = delimitedTextImportOptions( ...
  'VariableNamesLine' , 1 , ...
  'DataLines' , [2 Inf] , ...
   'VariableNames' , {} );
readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )

    ExtraVar1    ExtraVar2
    _________    _________
     'Prj  '      ' Cap'
     'A    '      '  1'
     'A    '      '  2'
     'H    '      '  4'
     'H    '      '  5'

Following suggestions in the responses, I tried the default readtable options. Unfortunately, this did not recognize , as a delimiter:

readtable('Test.csv')

Warning: Table variable names were modified to make them valid MATLAB identifiers. The original names are saved in the VariableDescriptions property. 

    Prj    x_     Cap
    ___    ___    ___
    'A'    ','     1 
    'A'    ','     2 
    'H'    ','     4 
    'H'    ','     5 

Using a format string helps recognition of the column heading line, but white space around the delimiters is kept for the string columns:

readtable('Test.csv', 'Format', '%s%u')

      Prj      Cap
    _______    ___
    'A    '     1 
    'A    '     2 
    'H    '     4 
    'H    '     5 

I get the same results regardless of whether Test.csv has Unix or DOS line endings.

I will continue to investigate, read, and experiment.

P.S. Very odd, but the Matlab Answers forum at Matlab Central won't let me post this question (prior to coming here). I can enter text for the subject heading, but no insertion point appears in the message body no matter how much I click. It happens using both Firefox and Edge.


Solution

  • Starting in R2020a, you can be straightforward and use

    readtable('Test.csv')
    

    The command automatically instructs to skip the first line of headers. The data type for each column will be inferred from the data itself.

    Alternatively, you could specify the data type of each column by using command option 'Format':

    readtable('Test.csv', 'Format', '%s%u')
    

    This will read your first column as a string and the second as an unsigned integer (for signed integer use %i).