I have two python functions for sorting a list of integers contained as strings. They are follows:
import random
n = 10000000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)
def sort_alg1(unsorted):
unsorted.sort(key = lambda x: int(x))
return unsorted
def sort_alg2(unsorted):
l=list(map(int,unsorted))
l.sort()
s=list(map(str,l))
return s
print(sort_alg1(unsorted))
print(sort_alg2(unsorted))
Both work as expected. However, according to my profiler (I am using the ever-popular line_profiler by Robert Kern), the first function i.e. sort_alg1
is more than 3 times slower in execution as sort_alg2
. Now that wouldn't be a big problem if I could pinpoint the reason for this, but I am unable to. I have tried looking up differences in the built-in sort()
method and map()
function, lambda, etc. all to no avail. If someone could educate me as to why this is happening, that would be really great!
Doing some benchmark:
from timeit import timeit
import random
n = 1_000_000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)
def sort_alg1(unsorted):
unsorted.sort(key=lambda x: int(x))
return unsorted
def sort_alg2(unsorted):
l = list(map(int, unsorted))
l.sort()
s = list(map(str, l))
return s
t1 = timeit(lambda: sort_alg1(unsorted), number=1)
t2 = timeit(lambda: sort_alg2(unsorted), number=1)
print(t1)
print(t2)
Prints:
0.4568827738985419
0.2486396620515734
So it seems that sort_alg2
is faster. But the reason is that sort_alg2
receives already sorted array from sort_alg1
. If you slightly change the benchmark:
t1 = timeit(lambda: sort_alg1(unsorted), number=1)
random.shuffle(unsorted) # <---- shufle again the array
t2 = timeit(lambda: sort_alg2(unsorted), number=1)
print(t1)
print(t2)
Prints:
0.456114097032696
0.5958397497888654
So first function is faster.