pythonsetset-operations

Most efficient way to compute differences and intersection of two sets in Python


Let's say we have two sets s1 and s2.

I require three different sets based on these two sets:

  1. Set of elements that exist in s1 but not in s2.
  2. Set of elements that exist in s2 but not in s1.
  3. Set of elements that exist in both s1 and s2.

These could be easily computed as follows:

s1 = {1, 2, 3, 4, 5}
s2 = {3, 4, 5, 6, 7}

o1 = s1 - s2
o2 = s2 - s1
o3 = s1 & s2

Is there a way to compute these sets more efficiently? I imagine the different set operations have multiple internal processing steps in common, so there might be redundancy.


Solution

  • Given that these operations are implemented in C in the core part of the language, I think there's pretty much nothing you can do to speed these operations up in custom-written code.

    But...

    Since s1 - s2 is the same as s1 - (s1 & s2), you could calculate o3 first and use this smaller set in the difference operations:

    s1 = {1, 2, 3, 4, 5}
    s2 = {3, 4, 5, 6, 7}
    
    o3 = s1 & s2
    o1 = s1 - o3
    o2 = s2 - o3
    

    I doubt it will make much of a difference, since set lookup and insertion are O(1), but it's conceivable that operations over smaller sets are still slightly faster.