I want to write a function which takes a matrix as an input. This is a frequent low-level call in a complicated project, so making this function as fast as possible has potentially serious performance implications. Because speed is so important to me, I'm using the types in FixedSizeArrays
as I know that this will save on memory usage. But I often know certain properties of the input matrix, and I'm not certain that I'm making optimal use of that.
Here's a simple example. Imagine that the function I want to make the following as fast as possible:
using FixedSizeArrays
function foo( input::Mat )
# NB: Mat is the FixedSizeArrays matrix type
return 2 * input
end
Obviously this is a trivial example, but that's not the point. The point is that I know something about the dimensions of the matrix input
: it always has only two columns and I can always specify the number of rows at run-time. This seems like information that could be passed to the compiler to make my code faster. Could I pass it as an argument that defines the size of input
somehow? Here's an example that doesn't work, but should give you some idea of what I'm trying to do.
function bar( int::N, thismat::Mat{N,2,Float64} )
return 2 * thismat
end
Is there something like this that I can do? Would this even work if I could? Maybe FixedSizeArrays already does everything that can be done. Thanks for you thoughts!
Fixed-size arrays are already specialized on size. These arrays are not appropriate for when the number of rows, N
in your case, can vary. Any performance problems you notice are probably because of overspecialization.
Let me be a little more specific.
The Julia compiler is able to achieve zero-cost abstraction through aggressive specialization on the types of arguments. So in general (that is, in all cases except for a few where specialization would be too expensive, or is explicitly disabled), if a function is called with two different type signatures, two versions of this function will be compiled.
Since the size of a Mat
is part of its type, that means a version will be compiled for each possible size of the Mat
. So the specialization you seek is done already.
Specialization, however, is not free. There are two costs associated with it:
Thus if your matrices are of the size (2, N)
, where N
varies and is not known at compile time, the performance cost of dynamic dispatch will be incurred. This performance cost can be restricted by using the function barrier technique: we only incur that cost once for each type-unstable call, so limiting the number of such calls improves performance.
But what would increase performance even more would be to avoid this dynamic dispatch entirely. It is possible to construct an array type that only encodes the number of columns in the type, and has the number of rows as a field at runtime. That is, it's possible your performance problem is due to overspecialization, and you need to create your types to reduce the amount of specialization.
Finding the right balance is central to squeezing as much performance as possible out of an application. Specializing on the size of an array is in fact useful quite rarely—even C and C++ code, for instance, tends to pass array sizes as runtime parameters, instead of specializing on a particular array size. This is not that expensive. In more cases that not, FixedSizeArrays.jl
will not improve performance, but rather hurt it. There are certainly situations where it will help—but yours may not be one of them.
In your case, for maximal performance, I suspect that a type like this would be fastest:
immutable TwoColumnMatrix{T, BaseType} <: AbstractArray{T, 2}
height::Int
base::BaseType
end
function TwoColumnMatrix(A::Matrix)
size(A, 2) == 2 || throw(ArgumentError("must be two columns"))
TwoColumnMatrix{eltype(A), typeof(A)}(size(A, 1), A)
end
Base.@propagate_inbounds function getindex(M::TwoColumnMatrix, n::Int)
M.base[n]
end
size(M::TwoColumnMatrix) = (M.height, 2)
You might need to define additional methods for maximum performance, and as always, benchmark. It's possible the overhead of the wrapper is not worth the compiler knowing about the dimensions.