pythonjuliagame-theory

Problem with memory allocation in Julia code


I used a function in Python/Numpy to solve a problem in combinatorial game theory.

import numpy as np
from time import time

def problem(c):
    start = time()
    N = np.array([0, 0])
    U = np.arange(c)
    
    for _ in U:
        bits = np.bitwise_xor(N[:-1], N[-2::-1])
        N = np.append(N, np.setdiff1d(U, bits).min())

    return len(*np.where(N==0)), time()-start 

problem(10000)

Then I wrote it in Julia because I thought it'd be faster due to Julia using just-in-time compilation.

function problem(c)
    N = [0]
    U = Vector(0:c)
    
    for _ in U
        elems = N[1:length(N)-1]
        bits = elems .⊻ reverse(elems)
        push!(N, minimum(setdiff(U, bits))) 
    end
    
    return sum(N .== 0)
end

@time problem(10000)

But the second version was much slower. For c = 10000, the Python version takes 2.5 sec. on an Core i5 processor and the Julia version takes 4.5 sec. Since Numpy operations are implemented in C, I'm wondering if Python is indeed faster or if I'm writing a function with wasted time complexity.

The implementation in Julia allocates a lot of memory. How to reduce the number of allocations to improve its performance?


Solution

  • The original code can be re-written in the following way:

    function problem2(c)
        N = zeros(Int, c+2)
        notseen = falses(c+1)
    
        for lN in 1:c+1
            notseen .= true
            @inbounds for i in 1:lN-1
                b = N[i] ⊻ N[lN-i]
                b <= c && (notseen[b+1] = false)
            end
            idx = findfirst(notseen)
            isnothing(idx) || (N[lN+1] = idx-1)
        end
        return count(==(0), N)
    end
    

    First check if the functions produce the same results:

    julia> problem(10000), problem2(10000)
    (1475, 1475)
    

    (I have also checked that the generated N vector is identical)

    Now let us benchmark both functions:

    julia> using BenchmarkTools
    
    julia> @btime problem(10000)
      4.938 s (163884 allocations: 3.25 GiB)
    1475
    
    julia> @btime problem2(10000)
      76.275 ms (4 allocations: 79.59 KiB)
    1475
    

    So it turns out to be over 60x faster.

    What I do to improve the performance is avoiding allocations. In Julia it is easy and efficient. If any part of the code is not clear please comment. Note that I concentrated on showing how to improve the performance of Julia code (and not trying to just replicate the Python code, since - as it was commented under the original post - doing language performance comparisons is very tricky). I think it is better to concentrate in this discussion on how to make Julia code fast.


    EDIT

    Indeed changing to Vector{Bool} and removing the condition on b and c relation (which mathematically holds for these values of c) gives a better speed:

    julia> function problem3(c)
               N = zeros(Int, c+2)
               notseen = Vector{Bool}(undef, c+1)
    
               for lN in 1:c+1
                   notseen .= true
                   @inbounds for i in 1:lN-1
                       b = N[i] ⊻ N[lN-i]
                       notseen[b+1] = false
                   end
                   idx = findfirst(notseen)
                   isnothing(idx) || (N[lN+1] = idx-1)
               end
               return count(==(0), N)
           end
    problem3 (generic function with 1 method)
    
    julia> @btime problem3(10000)
      20.714 ms (3 allocations: 88.17 KiB)
    1475