The goal is to find the intersection of 2 arrays. I know you're supposed to use sets to get a faster runtime. However, in my notebook I was surprised to find that my runtime is not faster with the sets method. (89 ms (array) vs. 220 ms (sets)). Is there something wrong with my code or something I'm misunderstanding?
A = range(50,1000000)
B = range(-1000000,60)
%%time
# O(m*n)?
def intersection(A, B):
array = []
for i in A:
if i in B:
array.append(i)
return array
intersection(A,B)
%%time
# O(1)?
def intersection(A, B):
set_a = set(A)
set_b = set(B)
return [i for i in set_a if i in set_b]
# return set_a.intersection(set_b)
intersection(A,B)
You are not comparing arrays to sets.
You are comparing ranges to sets.
Checking the membership in a range is almost instantaneous (two comparisons: one with the lower, one with the upper bound).
Checking the membership in an array would take millions of iterations in this case, which would be substantially longer than membership checks for sets.
If you convert the range
s to list
s first, you see a huge difference:
import time
A = range(50,100000)
B = range(-100000,60)
def intersectionArrays(A, B):
array = []
arr_a = list(A)
arr_b = list(B)
for i in arr_a:
if i in arr_b:
array.append(i)
return array
def intersectionSets(A, B):
set_a = set(A)
set_b = set(B)
return [i for i in set_a if i in set_b]
# return set_a.intersection(set_b)
startArr = time.process_time()
intersectionArrays(A,B)
endArr = time.process_time()
startSet = time.process_time()
intersectionSets(A,B)
endSet = time.process_time()
print(endArr - startArr)
print(endSet - startSet)
It prints something like
Arrays: 39.871636528
Sets: 0.008161553000000765
There is no point trying it with 1000000
- it would take hours.
Furthermore, in order to measure something vaguely realistic, you should shuffle the numbers, otherwise branch prediction might distort the results.