I'm writing simulation with MATLAB where I used CUDA acceleration.
Suppose we have vector x
and y
, matrix A
and scalar variables dt
,dx
,a
,b
,c
.
What I found out was that by putting x
,y
,A
into gpuArray()
before running the iteration and built-in functions, the iteration could be accelerated significantly.
However, when I tried to put variables like dt
,dx
,a
,b
,c
into the gpuArray()
, the program would be significantly slowed down, by a factor of over 30%. (Time increased from 7s to 11s).
Why it was not a good idea to put all the variables into the gpuArray()
?
(Short comment, those scalars were multiplied together with x
,y
,A
, and was never used during the iteration alone.)
GPU hardware is optimised for working on relatively large amounts of data. You only really see the benefit of GPU computing when you can feed the many processing cores lots of data to keep them busy. Typically this means you need operations working on thousands or millions of elements.
The overheads of launching operations on the GPU dwarf the computation time when you're dealing with scalar quantities, so it is no surprise that they are slower than on the CPU. (This is not peculiar to MATLAB & gpuArray
).