matlabfile-ioformatted-input

How to read files with possible headers in MATLAB?


Originally I have files look like:

1.4 2.0
4.2 2.1
5.1 1.2

The column number is fixed while the row numbers vary file by file. The following code can read those files:

fid = fopen("my_file.txt","r");
M = fscanf(fid,"%f",[2,inf]);

Here M is the transpose of the data file.

Now I get several new files with potentially one line header starting with #:

# file description
1.0 2.0
1.5 2.2

It is guaranteed that then header occupies no more than one line, and always start with #.

I know I can read files line by line to deal with the headers. I wonder if there is any way that I can make as little change as possible to my original code such that the new code can read files in both formats.

textscanf function seems to be able to take care of headers, but the argument to field Headerlines is a fixed number.


Solution

  • If your headers are known to be prefixed with a specific character then you can use textscan's 'CommentStyle' NV-pair to ignore them:

    With the following test.txt:

    # A header line
    1 2
    3 4
    5 6
    

    We can use:

    fID = fopen("test.txt", "r");
    M = textscan(fID, "%f", "CommentStyle", "#");
    M = reshape(M{:}, 2, []).';
    fclose(fID)
    

    Which gives us:

    >> M
    
    M =
    
         1     2
         3     4
         5     6
    

    Alternatively, if you want to stick with fscanf you can check the first line of the file with fgetl and use frewind, if necessary (because fgetl moves the file pointer), to go back to the beginning of the file if there is no header.

    For example:

    fID = fopen("test.txt", "r");
    
    % Test for header
    tline = fgetl(fID);  % Moves file pointer to next line
    commentchar = "#";
    if strcmp(tline(1), commentchar)
        % Header present, read from line 2
        M = fscanf(fID, "%f", [2, inf]).';
    else
        % Header present, rewind to beginning of file & read as before
        frewind(fID);
        M = fscanf(fID, "%f", [2, inf]).';
    end
    fclose(fID);
    

    Which gives the same result as above. If the number of header lines isn't constant, you can get into using ftell and fseek with a while loop to skip past headers, but at that point you're probably making things more complicated than they really need to be for this application.