loopsjuliamembership

Repeated membership tests in a loop


Consider this function

using Distributions
using StatsBase
test_vector = sample([1,2,3], 100000000)
function test_1( test_vector)
    rand_vector = randn(3333)
    
    sum = 0.0
    for t in 1:1000
        if t in test_vector
            sum = sum +rand_vector[t]
        else 
            sum = sum - rand_vector[t]
        end
    end
end

I applied @profview to understand performance and it turns out most of the time is spent on if t in test_vector. Is there a way to speed up this part of the program? I thought about excluding test_vector from 1:1000 and run two loops, but this creates memory allocation. Can I get a hint?

P.S. I intend to let the user pass in any test_vector. I'm using sample to create a test_vector just for illustration.


Solution

  • If the vector is large, changing the check for element membership will be faster if you create a Set for the check:

    using Distributions
    using StatsBase
    using BenchmarkTools
    
    test_vector = sample([1,2,3], 1000000)
    
    function test_1(test_vector)
        rand_vector = randn(3333)
        
        sum1 = 0.0
        for t in 1:1000
            if t in test_vector
                sum1 = sum1 + rand_vector[t]
            else 
                sum1 = sum1 - rand_vector[t]
            end
        end
        return sum1
    end
    
    function test_1_set(test_vector)
    rand_vector = randn(3333)
    
    test_set = Set(test_vector)
    sum2 = 0.0
    for t in 1:1000
        if t in test_set
            sum2 += rand_vector[t]
        else 
            sum2 -= rand_vector[t]
        end
    end
        return sum2
    end
    
    @btime test_1(test_vector)
    @btime test_1_set(test_vector)
    
    677.818 ms (3 allocations: 26.12 KiB)
    8.795 ms (10 allocations: 18.03 MiB)