matlabtext-filesfooterdata-import

How can I import a table in MATLAB with footer rows?


I have a text file with header lines above the table and below the table is a blank line and then a table with summary statistics for the table. Handling the header lines is easy as most of the standard functions have an option for that (i.e. readtable). The length of the file is not always the same. The issue with readtable is that the footer table has fewer columns than the main table, so the function is unable to read those lines and returns an error.

This is the error that I get with readtable:

Error using readtable (line 216)
Reading failed at line 2285. All lines of a text file must have the same number of delimiters. Line 2285 has 0 delimiters, while preceding lines
have 24.

Note: readtable detected the following parameters:
'Delimiter', '\t', 'HeaderLines', 21, 'ReadVariableNames', true, 'Format', '%T%f%f%f%q%f%f%f%f%f%f%q%f%f%f%f%f%f%f%f%f%f%f%f%f'

Here's what I've come up with as an alternative solution:

dataStartRow = 23;
numRows = length(readmatrix(filePath, 'NumHeaderLines',0));
dataEndRow = numRows - 8;

opts = detectImportOptions(filePath);
opts.DataLines = [dataStartRow, dataEndRow];
dataTable = readtable(filePath, opts);

This works but I have another file with a different number of footer rows and I don't know how to deal with this without hardcoding in the number of footer lines.

I've considered using fgetl, and reading lines in one by one to determine when to stop adding to the table, but that seems very inefficient. How can I import this table with an unknown number of table lines and an unknown number of footer lines?


Solution

  • First of all, don't conclude that something 'seems very inefficient' unless you've profiled or timed it and found that it's actually too slow for your requirements.

    In this case though, you can change the action MATLAB takes in the event of an error, by setting this property of your delimitedTextImportOptions object:

    opts = detectImportOptions(filePath);
    opts.ImportErrorRule = 'omitrow'; # ignore any lines that don't match the detected pattern
    dataTable = readtable(filePath, opts);
    

    If you use this code regularly to read in new data, I'd consider doing some sort of validation on dataTable to make sure it is consistent with what you expect, just in case a new file causes detectImportOptions to give a different result for some reason. For example if you know that the number of columns and their format should always be the same, you could specify

    opts.Format = '%T%f%f%f%q%f%f%f%f%f%f%q%f%f%f%f%f%f%f%f%f%f%f%f%f';
    

    and then check that the resulting table is not empty.