I've just spent a while debugging some particularly slow code and have been completely thrown off by the MATLAB profiler. This looks to me like a massive bug, so I was wondering if anyone could cast any light on to what is going on here.
Here is some code that will cause the problem:
function profiler_test
%%% Create 20 files with random data
count = 20;
for i = 1 : count
x = rand(3);
save(sprintf('temp_file_%06d', i), 'x');
end
%%% Load them in a for loop
xs = cell(1, count);
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
end
toc
%%% Load them in a for loop, but writing a small log file on the way
tic;
for i = 1 : count
x = load(sprintf('temp_file_%06d', i), 'x');
xs{i} = x.x;
file = fopen(sprintf('temp_logfile_%d', i), 'w');
fprintf(file, 'Success\n');
fclose(file);
end
toc
end
The first for
loop takes 0.239739 seconds, the second takes 4.411179.
Now, I should make it clear that I am aware of the sloppy idea I had, shown in the second for
loop example, of creating a log file for each result - it was because I was running on a cluster where I couldn't see the output, I wanted a cheap indication of the function's progress, and this turned out to be the bottleneck. I'm fine with that.
My problem however is that I've spent a day trying to optimise the wrong line, because the MATLAB profiler says this:
1 24 tic;
1 25 for i = 1 : count
4.41 20 26 x = load(sprintf('temp_file_%06d', i), 'x');
20 27 xs{i} = x.x;
28
20 29 file = fopen(sprintf('temp_logfile_%d', i), 'w');
20 30 fprintf(file, 'Success\n');
20 31 fclose(file);
20 32 end
1 33 toc
It's placed the entire time taken to execute the final three lines on the line for load
. In my actual program, the load
was not so close to the other bit so it didn't occur to me until I decided to distrust the profiler. My question is: what is going on here? Why has this happened and should I be watching out for any more bizarre behaviour like this?
I'm using MATLAB 2011a. Many thanks.
EDIT: I seem to be causing some confusion, apologies. Here is the situation:
for
loops shown above are identical, except that the second one has three lines at the bottom which write to a temporary file each iteration.load
function call - exactly the same call as the first loop, which was faster - is now taking 4 seconds instead of 0.2. So either the presence of the last three lines causes the load
to be slow (I had disregarded this; is that even a possibility?), OR the MATLAB profiler is incorrectly reporting that load
is taking 4 seconds when it is clearly not.Either way it seems to me that something very strange is happening.
EDIT: Seem to have answered it myself, see below. Changed the title as it was misleading
Actually, I think I've solved it. I was wrong to jump to the conclusion that the additional processing time was occurring on the new lines, so my question is now a little misleading - the profiler is correct. However, I still didn't understand why writing to a temporary file would cause load
to slow down. I had a thought, which was to try this:
file = fopen(sprintf('../temp_logfile_%d', i), 'w');
That is, write to a file in the parent directory instead of the current working directory. This removed the problem, and was very fast. The reason, I am guessing, is that the current directory is in my MATLAB search path, as are a bunch of other directories. I presume that every time MATLAB uses a function which looks though the whole search path, as load
does, it checks to see if any directories have been modified, and if so re-parses the whole lot to see what files are available. Writing a new file to the working directory certainly would have caused this. This may have been worse in my case since I also have a whole tree of subdirectories in the working directory which are part of the search path.
Anyway, thanks to those who had a look and sorry that the answer turned out to be something quite different from the question. Be aware when using functions which rely on the entire search path!