I'm trying to do matrix addition using Alea CuBlas axpy, but it seems to only add the top row
let matrixAddition (a:float[,]) (b: float[,]) =
use mA = gpu.AllocateDevice(a)
use mB = gpu.AllocateDevice(b)
blas.Axpy(a.Length,1.,mA.Ptr,1,mB.Ptr,1)
Gpu.Copy2DToHost(mB)
There is one important difference between JokingBear's code and redb's code.
At this line of the problematic code
blas.Axpy(a.Length,1.,mA.Ptr,1,mB.Ptr,1)
a has type float[,] and the Length will be the number of elements in that matrix a.
However, the corrected code use this
blas.Axpy(deviceA.Length, 1f, deviceA.Ptr, 1, deviceB.Ptr, 1);
deviceA is not float[,] anymore but DeviceMemory2D object.
The DeviceMemory2D.Length is surprisingly larger (384 for 3x3 matrix on my hardware) than (float[,]).Length as the allocation on the GPU seems to occupy much more space for some unknown reasons.
The key reason that the JokingBear's code sums only the top row because the (float[,]).Length is too short for the data structure on GPU memory which is much longer. There is nothing to do with the version of alea.