matlabmatlab-load

Why does writing to an unrelated file cause the load function to be so slow?


I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.

Here is some code that will cause the problem:

function profiler_test

  %%% Create 20 files with random data

  count = 20;

  for i = 1 : count
    x = rand(3);
    save(sprintf('temp_file_%06d', i), 'x'); 
  end

  %%% Load them in a for loop

  xs = cell(1, count);

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;
  end
  toc

  %%% Load them in a for loop, but writing a small log file on the way

  tic;
  for i = 1 : count
    x = load(sprintf('temp_file_%06d', i), 'x');
    xs{i} = x.x;

    file = fopen(sprintf('temp_logfile_%d', i), 'w');
    fprintf(file, 'Success\n');
    fclose(file);
  end
  toc


end

The first for loop takes 0.239739 seconds, the second takes 4.411179.

Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. I'm fine with that.

My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this:

         1   24   tic; 
         1   25   for i = 1 : count 
4.41    20   26     x = load(sprintf('temp_file_%06d', i), 'x'); 
        20   27     xs{i} = x.x; 
             28     
        20   29     file = fopen(sprintf('temp_logfile_%d', i), 'w'); 
        20   30     fprintf(file, 'Success\n'); 
        20   31     fclose(file); 
        20   32   end 
         1   33   toc

It's placed the entire time taken to execute the final three lines on the line for load. In my actual program, the load was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. My question is: what is going on here? Why has this happened and should I be watching out for any more bizarre behaviour like this?

I'm using MATLAB 2011a. Many thanks.

EDIT: I seem to be causing some confusion, apologies. Here is the situation:

Either way it seems to me that something very strange is happening.

EDIT: Seem to have answered it myself, see below. Changed the title as it was misleading


Solution

  • Actually, I think I've solved it. I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct. However, I still didn't understand why writing to a temporary file would cause load to slow down. I had a thought, which was to try this:

    file = fopen(sprintf('../temp_logfile_%d', i), 'w');
    

    That is, write to a file in the parent directory instead of the current working directory. This removed the problem, and was very fast. The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories. I presume that every time MATLAB uses a function which looks though the whole search path, as load does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. Writing a new file to the working directory certainly would have caused this. This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path.

    Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. Be aware when using functions which rely on the entire search path!