arraystypesjuliamultiple-dispatch

How to write "good" Julia code when dealing with multiple types and arrays (multiple dispatch)


OP UPDATE: Note that in the latest version of Julia (v0.5), the idiomatic approach to answering this question is to just define mysquare(x::Number) = x^2. The vectorised case is covered using automatic broadcasting, i.e. x = randn(5) ; mysquare.(x). See also the new answer explaining dot syntax in more detail.

I am new to Julia, and given my Matlab origins, I am having some difficulty determining how to write "good" Julia code that takes advantage of multiple dispatch and Julia's type system.

Consider the case where I have a function that provides the square of a Float64. I might write this as:

function mysquare(x::Float64)
    return(x^2);
end

Sometimes, I want to square all the Float64s in a one-dimentional array, but don't want to write out a loop over mysquare everytime, so I use multiple dispatch and add the following:

function mysquare(x::Array{Float64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

But now I am sometimes working with Int64, so I write out two more functions that take advantage of multiple dispatch:

function mysquare(x::Int64)
    return(x^2);
end
function mysquare(x::Array{Int64, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

Is this right? Or is there a more ideomatic way to deal with this situation? Should I use type parameters like this?

function mysquare{T<:Number}(x::T)
    return(x^2);
end
function mysquare{T<:Number}(x::Array{T, 1})
    y = Array(Float64, length(x));
    for k = 1:length(x)
        y[k] = x[k]^2;
    end
    return(y);
end

This feels sensible, but will my code run as quickly as the case where I avoid parametric types?

In summary, there are two parts to my question:

  1. If fast code is important to me, should I use parametric types as described above, or should I write out multiple versions for different concrete types? Or should I do something else entirely?

  2. When I want a function that operates on arrays as well as scalars, is it good practice to write two versions of the function, one for the scalar, and one for the array? Or should I be doing something else entirely?

Finally, please point out any other issues you can think of in the code above as my ultimate goal here is to write good Julia code.


Solution

  • Julia compiles a specific version of your function for each set of inputs as required. Thus to answer part 1, there is no performance difference. The parametric way is the way to go.

    As for part 2, it might be a good idea in some cases to write a separate version (sometimes for performance reasons, e.g., to avoid a copy). In your case however you can use the in-built macro @vectorize_1arg to automatically generate the array version, e.g.:

    function mysquare{T<:Number}(x::T)
        return(x^2)
    end
    @vectorize_1arg Number mysquare
    println(mysquare([1,2,3]))
    

    As for general style, don't use semicolons, and mysquare(x::Number) = x^2 is a lot shorter.

    As for your vectorized mysquare, consider the case where T is a BigFloat. Your output array, however, is Float64. One way to handle this would be to change it to

    function mysquare{T<:Number}(x::Array{T,1})
        n = length(x)
        y = Array(T, n)
        for k = 1:n
            @inbounds y[k] = x[k]^2
        end
        return y
     end
    

    where I've added the @inbounds macro to boost speed because we don't need to check the bound violation every time — we know the lengths. This function could still have issues in the event that the type of x[k]^2 isn't T. An even more defensive version would perhaps be

    function mysquare{T<:Number}(x::Array{T,1})
        n = length(x)
        y = Array(typeof(one(T)^2), n)
        for k = 1:n
            @inbounds y[k] = x[k]^2
        end
        return y
     end
    

    where one(T) would give 1 if T is an Int, and 1.0 if T is a Float64, and so on. These considerations only matter if you want to make hyper-robust library code. If you really only will be dealing with Float64s or things that can be promoted to Float64s, then it isn't an issue. It seems like hard work, but the power is amazing. You can always just settle for Python-like performance and disregard all type information.