Matrix multiplication Y = A * B can be implemented by mul!(Y, A, B) to save on memory allocations. But mul! can't be used if Y = A. Is there a similarly efficient way to calculate Y *= B? Or if not, what is the most efficient way to do matrix multiplication Y *= B
Small working example:
n = 10
A = rand(n,n)
B = rand(n,n)
Y = zeros(n,n)
#mul! removes allocations
@allocated Y = A * B #896
@allocated mul!(Y, A, B) #0
#mul! can't be applied in this case
@allocated Y *= B #896
#desired function performance
@allocated mul_2!(Y, B) #0
Thanks in advance for your help!
I don't think you can implement this efficiently (due to how blocking in matrix multiplication works). You're better off just keeping around another matrix of the appropriate size to use as a buffer.