Demo (I expect result [3]
):
[1,2] - [1,2,3] => [] # Hmm
[1,2,3] - [1,2] => [3] # I see
a = [1,2].to_set => #<Set: {1, 2}>
b = [1,2,3].to_set => #<Set: {1, 2, 3}>
a - b => #<Set: {}> WTF!
And:
[1,2,9] - [1,2,3] => [9] # Hmm. Would like [[9],[3]]
How is one to perform a real set difference regardless of order of the inputs?
Ps. As an aside, I need to do this for two 2000-element arrays. Usually, array #1 will have fewer elements than array #2, but this is not guaranteed.
The -
operator applied to two arrays a
and b
gives the relative complement of b
in a
(items that are in a
but not in b
).
What you are looking for is the symmetric difference of two sets (the union of both relative complements between the two). This will do the trick:
a = [1, 2, 9]
b = [1, 2, 3]
a - b | b - a # => [3, 9]
If you are operating on Set
objects, you may use the overloaded ^
operator:
c = Set[1, 2, 9]
d = Set[1, 2, 3]
c ^ d # => #<Set: {3, 9}>
For extra fun, you could also find the relative complement of the intersection in the union of the two sets:
( a | b ) - ( a & b ) # => #<Set: {3, 9}>
WIDTH_OF_LABELS = 10.freeze
REPLICATION = 10000.freeze
SIZE_OF_ARRAY = 2000.freeze
a = (1..SIZE_OF_ARRAY).to_a
b = (SIZE_OF_ARRAY/2..SIZE_OF_ARRAY*1.5).to_a # Ensures half overlapping.
a_set = Set.new( a )
b_set = Set.new( b )
Benchmark.bm( WIDTH_OF_LABELS ) do |benchmark|
benchmark.report( "a - b | b - a" ) do
REPLICATION.times do
a - b | b - a
end
end
benchmark.report( "Converting to Sets" ) do
REPLICATION.times do
c = Set.new( a )
d = Set.new( b )
c ^ d
end
end
benchmark.report( "Already Sets" ) do
REPLICATION.times do
a_set ^ b_set
end
end
benchmark.report( "( a | b ) - ( a & b )" ) do
REPLICATION.times do
( a | b ) - ( a & b )
end
end
end
user system total real
a - b | b - a 2.289062 0.009412 2.298474 ( 2.299243)
Converting to Sets 7.185954 0.052709 7.238663 ( 7.247614)
Already Sets 3.372465 0.060009 3.432474 ( 3.433157)
( a | b ) - ( a & b ) 2.728917 0.035183 2.764100 ( 2.764778)
Fastest approach is a - b | b - a
.
You can see I ran one where the Sets were already created versus one where we needed to convert the Arrays into Sets first.
When you already are working with Sets, it is more than 2x faster than converting to Sets but it is still slower than just using a - b | b - a
.