performancejuliavectorizationcontains

Julia: Easiest and efficient way to check if each element of a vector is in another list? .∈ doesn't seem to work


In Julia, assuming I have a vector and a list of candidates:

x = [1, 2, 3, 4, 5]
targetlist = [1, 2]

I would like to iteratively check for each element of the vector if it is contained in the target list.

Using \in ∈ works for one element:

x[1] ∈ targetlist
# true

But does not seem to vectorize correctly?

x .∈ targetlist
ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 5 and 2

Solution

  • You need to wrap targetlist in a Ref to make it behave like a scalar value:

    julia> x .∈ Ref(targetlist)
    5-element BitVector:
     1
     1
     0
     0
     0
    

    Additionally, if your lists are large, you might want to convert targetlist into a Set to make this operation O(n) instead of O(n^2):

    julia> x .∈ Ref(Set(targetlist))
    5-element BitVector:
     1
     1
     0
     0
     0
    

    Time comparison for 10k Int32 elements in x and y each:

    julia> x = rand(Int32, 10000); y = rand(Int32, 10000);
    
    julia> @time x .∈ Ref(y);
      0.118258 seconds (5 allocations: 5.594 KiB)
    
    julia> @time x .∈ Ref(Set(y));
      0.000830 seconds (13 allocations: 86.133 KiB)