I am trying to calculate an IRR with several dimensions in Matlab 2019a. My formula works in theory (ignoring the "multiple rates of return" warning for now), but the problem is that for bigger matrices, i.e. noScenarios > 5 or so, the code gets very slow. What are programming alternatives for this? I tried also fsolve but I think it's not faster, either.
Please note that as I'm no math crack, a simple key word like "Brent's method" does not suffice for me (e.g. as in What is the Most Efficient Way to Calculate the Internal Rate of Return IRR?). I would have to know a) how to implement it in Matlab, and b) if it is pretty idiot proof so that nothing can go wrong? Thank you!
clc
clear
close all
noScenarios = 50;
CF = ones(300,noScenarios,noScenarios,noScenarios);
CF = [repmat(-300, 1,noScenarios,noScenarios,noScenarios); CF];
for scenarios1 = 1:noScenarios
for scenarios2 = 1:noScenarios
for scenarios3 = 1:noScenarios
IRR3dimensional(scenarios1,scenarios2,scenarios3) = irr(CF(:,scenarios1,scenarios2,scenarios3));
end
end
end
To calculate IRR, you need to solve a polynomial equation. This has to be done for each cash flow vector separately. Hence, applying irr
to a multidimensional matrix does not improve execution time. I suspect that Matlab still uses a loop internally.
You might be able to gain some speed by playing with optimization options of fsolve
but a large improvement is very unlikely. Presumably, Matlab developers already chose a sufficiently good approach.
Thus, your only other alternative is parallelization. If you have access to a server or your laptop/desktop has multiple CPUs, you can reduce your run time by running irr
functions in parallel. (You also probably need a Parallel Computing Toolbox.)
I modified your example slightly to use random cash flow values to make it easier to check. However, I reduced the number of scenarios and time points, so that the timeit
function could run multiple simulations in a reasonable time. (Also, please keep in mind that the execution time seems to be exponential in the number of time points.)
t = 150;
noScenarios = 10;
noThreads = 4;
CF = rand(t,noScenarios,noScenarios,noScenarios);
CF = [-rand(1,noScenarios,noScenarios,noScenarios); CF];
h1 = @() f1(CF, noScenarios);
fprintf("%0.4f : single thread, loop\n", timeit(h1))
h2 = @() f2(CF, noScenarios);
fprintf("%0.4f : single thread, vectorized\n", timeit(h2))
poolObj = parpool('local', noThreads);
h3 = @() f3(CF, noScenarios);
fprintf("%0.4f : parallelized outer loop\n", timeit(h3))
delete(poolObj);
poolObj = parpool('local', noThreads);
h4 = @() f4(CF, noScenarios);
fprintf("%0.4f : parallelized inner loop\n", timeit(h4))
delete(poolObj);
function res = f1(CF, noScenarios)
res = zeros(noScenarios, noScenarios, noScenarios);
for scenarios1 = 1:noScenarios
for scenarios2 = 1:noScenarios
for scenarios3 = 1:noScenarios
res(scenarios1,scenarios2,scenarios3) = irr(CF(:,scenarios1,scenarios2,scenarios3));
end
end
end
end
function res = f2(CF, noScenarios)
res = reshape(irr(CF), noScenarios, noScenarios, noScenarios);
end
function res = f3(CF, noScenarios)
res = zeros(noScenarios, noScenarios, noScenarios);
parfor scenarios1 = 1:noScenarios
for scenarios2 = 1:noScenarios
for scenarios3 = 1:noScenarios
res(scenarios1,scenarios2,scenarios3) = irr(CF(:,scenarios1,scenarios2,scenarios3));
end
end
end
end
function res = f4(CF, noScenarios)
res = zeros(noScenarios, noScenarios, noScenarios);
for scenarios1 = 1:noScenarios
for scenarios2 = 1:noScenarios
parfor scenarios3 = 1:noScenarios
res(scenarios1,scenarios2,scenarios3) = irr(CF(:,scenarios1,scenarios2,scenarios3));
end
end
end
end
When I ran this code on a server with 4 CPUs and 16 Gb of memory, I got the following results.
19.9357 : single thread, loop
20.4318 : single thread, vectorized
...
5.6346 : parallelized outer loop
...
12.4640 : parallelized inner loop
As you can see, the vectorized version of irr
provides no benefits over the loop. In this case, it is slightly slower. In my other tests, it was occasionally a bit faster.
However, you can significantly reduce your run time by parallelizing your outer loop with the parfor
function. It is better than parallelizing the inner-most loop because each batch has a certain execution overhead. So, a small number of larger batches have lower overhead than a large number of smaller batches.
Here is how the parallelization works. First, you create a pool of local worker threads with the command below. Make sure you do not exceed the number of CPUs that you have. parpool
can wait indefinitely until all local workers are created and it can only create a local worker if a CPU is available.
poolObj = parpool('local', noThreads);
The pool creation may take a few seconds. That is why I moved it outside of the function I timed. For larger jobs, pool create time is insignificant compared to the total execution time.
Here, I save the pool object in a variable and delete it afterwards. However, it is optional. The pool is destroyed by default after 30 minutes of inactivity or when Matlab terminates.
After that, you replace a for
loop you want to parallelize with a parfor
call, i.e. for scenarios1 = 1:noScenarios
becomes parfor scenarios1 = 1:noScenarios
. By default, parfor
will use all available workers but you can also specify the maximum number of workers it is allowed to use with parfor (scenarios1 = 1:noScenarios, maxWorkers)
. Note, however, that the execution order is not guaranteed, i.e. the fifth iteration may be executed before the third iteration.