I have the following CSV file with column headings on line 1:
Test.csv
--------
Prj , Cap
A , 1
A , 2
H , 4
H , 5
I tried to read this into a table, but I'm having trouble making readtable
recognize the column headings on line 1:
readtable( 'Test.csv' , ...
delimitedTextImportOptions( 'VariableNamesLine' , 1 ) )
Var1 ExtraVar1
_____ _________
'Prj' ' Cap'
'A' ' 1'
'A' ' 2'
'H' ' 4'
'H' ' 5'
What am I misunderstanding about the VariableNamesLine
parameter?
I am using Matlab 2019a. doc delimitedTextImportOptions
shows it as being introduced in Matlab 2016b, and I am running Matlab 2019a.
Troubleshooting steps
Here is the delimitedTextImportOptions
object:
dtio = delimitedTextImportOptions( 'VariableNamesLine' , 1)
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'system'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1'}
VariableTypes: {'char'}
SelectedVariableNames: {'Var1'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Location Properties:
DataLines: [1 Inf]
VariableNamesLine: 1
RowNamesColumn: 0
VariableUnitsLine: 0
VariableDescriptionsLine: 0
If I specify ReadVariableNames
as true, only the first column heading is recognized. And it still gets repeated in the data.
readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )
Prj ExtraVar1
_____ _________
'Prj' ' Cap'
'A' ' 1'
'A' ' 2'
'H' ' 4'
'H' ' 5'
I can avoid having headings read as data by explicitly specifying DataLines
, but the 2nd column heading is still unread.
dtio = delimitedTextImportOptions( ...
'VariableNamesLine' , 1 , ...
'DataLines' , [2 Inf] );
readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )
Prj ExtraVar1
___ _________
'A' ' 1'
'A' ' 2'
'H' ' 4'
'H' ' 5'
Oddly, the DataLines
specification is ignored if I additionally unset any preconceived VariableNames
:
dtio = delimitedTextImportOptions( ...
'VariableNamesLine' , 1 , ...
'DataLines' , [2 Inf] , ...
'VariableNames' , {} );
readtable( 'Test.csv' , dtio , 'ReadVariableNames',true )
ExtraVar1 ExtraVar2
_________ _________
'Prj ' ' Cap'
'A ' ' 1'
'A ' ' 2'
'H ' ' 4'
'H ' ' 5'
Following suggestions in the responses, I tried the default readtable
options. Unfortunately, this did not recognize ,
as a delimiter:
readtable('Test.csv')
Warning: Table variable names were modified to make them valid MATLAB identifiers. The original names are saved in the VariableDescriptions property.
Prj x_ Cap
___ ___ ___
'A' ',' 1
'A' ',' 2
'H' ',' 4
'H' ',' 5
Using a format string helps recognition of the column heading line, but white space around the delimiters is kept for the string columns:
readtable('Test.csv', 'Format', '%s%u')
Prj Cap
_______ ___
'A ' 1
'A ' 2
'H ' 4
'H ' 5
I get the same results regardless of whether Test.csv
has Unix or DOS line endings.
I will continue to investigate, read, and experiment.
P.S. Very odd, but the Matlab Answers forum at Matlab Central won't let me post this question (prior to coming here). I can enter text for the subject heading, but no insertion point appears in the message body no matter how much I click. It happens using both Firefox and Edge.
Starting in R2020a, you can be straightforward and use
readtable('Test.csv')
The command automatically instructs to skip the first line of headers. The data type for each column will be inferred from the data itself.
Alternatively, you could specify the data type of each column by using command option 'Format'
:
readtable('Test.csv', 'Format', '%s%u')
This will read your first column as a string and the second as an unsigned integer (for signed integer use %i
).