Consider this line of code:
gpuArray(-1)^0.5;
Which results in:
ans = 0.0000 + 1.0000i
Now consider the following line of code:
gpuArray(-1).^0.5;
Which results in:
Error using .^ POWER: needs to return a complex result, but this is not supported for real input X and Y on the GPU. Use POWER(COMPLEX(X), COMPLEX(Y,0)) instead.
The problem clearly has something to do with a double -> complex double
conversion on the GPU, which is not allowed. Indeed, when I apply the workaround (which is also mentioned in the docs) it solves the problem - but I don't understand why.
Would anybody shed some light on this? Is this some limitation of VRAM? Of the specific card I'm using (mine is GTX 660, having a CC of 3.0)? Of the MATLAB implementation (I'm using R2018b)? Of the OS?
There are a few methods of gpuArray
that behave this way, and the reason is simple: performance.
It is perfectly possible to write an implementation of e.g. sqrt
that behaves on the GPU the same way that MATLAB's CPU implementation works (i.e. compute a real result unless a complex result is required - in which case, return a complex result). Part of the work is already performed - otherwise the gpuArray
method wouldn't know when to throw an error. However, the expensive part is then re-allocating the (complex) output, and performing the operation again.
There are other slight noticeable quirks relating to gpuArray
and complex numbers - on the GPU, all-zero imaginary parts are not removed when the MATLAB CPU implementation would remove them. For example:
>> a = [1i, 2]; gA = gpuArray(a);
>> [isreal(a(2)), isreal(gA(2))]
ans =
1×2 logical array
1 0
(Remembering of course that MATLAB's isreal
function tells you about storage, not values).
EDIT: Just realised that there's a specific doc reference for the functions of gpuArray
that behave this way.