I wrote this solution to Project Euler #5.
import time
start_time = time.time()
def ProjectEulerFive (m = 20):
a = m
start = 2
while (m % start) == 0:
start += 1
b = start
while b < m:
if ( a % b) != 0:
a += m
b = start
continue
else:
b += 1
return a
import sys
if (len(sys.argv)) > 2:
print "error: this function takes a max of 1 argument"
elif (len(sys.argv)) == 2:
print ProjectEulerFive(int(sys.argv[1]))
else:
print ProjectEulerFive();
print "took " + str(time.time() - start_time ) + " seconds"
Takes my system about 8.5 seconds.
Then I decided to compare with other peoples solutions. I found this Project Euler 5 in Python - How can I optimize my solution?.
I hadn't thought of unique prime factorization.
But anyways, one supposedly optimized non-prime factorization based solution posted there:
import time
start_time = time.time()
check_list = [11, 13, 14, 16, 17, 18, 19, 20]
def find_solution(step):
for num in xrange(step, 999999999, step):
if all(num % n == 0 for n in check_list):
return num
return None
if __name__ == '__main__':
solution = find_solution(20)
if solution is None:
print "No answer found"
else:
print "found an answer:", solution
print "took " + str(time.time() - start_time ) + " seconds"
Takes my system about 37 seconds
My code is about 4 times faster even though I unnecessarily check for divisibility of 3,4,5,6,7,8,9,10, and 12.
I'm new to python, and having trouble seeing where the inefficiency is coming from.
EDIT:
I did another test.
import time
start_time = time.time()
def ProjectEulerFive (m = 20):
ls = [11, 13, 14, 15, 16, 17, 18, 19]
a = m
i = 0
while i < len(ls):
if ( a % ls[i]) != 0:
a += m
i = 0
continue
else:
i += 1
return a
print ProjectEulerFive();
print "took " + str(time.time() - start_time ) + " seconds"
Takes my system 6 seconds, but this:
import time
start_time = time.time()
def ProjectEulerFive (m = 20):
a = m
start = 11
b = start
while b < m:
if ( a % b) != 0:
a += m
b = start
continue
else:
b += 1
return a
print ProjectEulerFive()
print "took " + str(time.time() - start_time ) + " seconds"
Takes about 3.7 seconds
I see that although a faster solution has been posted, no one has actually answered the question. It's a rather difficult one to answer, in fact! The fundamental explanation is that function calls are relatively expensive. In order to make this conclusion persuasive, though, I'll have to dig fairly deeply into Python internals. Prepare yourself!
First of all, I'm going to disassemble (your third version of) ProjectEulerFive
and find_solution
from the "optimized" solution, using dis.dis
. There's a lot here, but a quick scan is all that's required to confirm that your code calls no functions at all:
>>> dis.dis(ProjectEulerFive)
2 0 LOAD_FAST 0 (m)
3 STORE_FAST 1 (a)
3 6 LOAD_CONST 1 (11)
9 STORE_FAST 2 (start)
4 12 LOAD_FAST 2 (start)
15 STORE_FAST 3 (b)
5 18 SETUP_LOOP 64 (to 85)
>> 21 LOAD_FAST 3 (b)
24 LOAD_FAST 0 (m)
27 COMPARE_OP 0 (<)
30 POP_JUMP_IF_FALSE 84
6 33 LOAD_FAST 1 (a)
36 LOAD_FAST 3 (b)
39 BINARY_MODULO
40 LOAD_CONST 2 (0)
43 COMPARE_OP 3 (!=)
46 POP_JUMP_IF_FALSE 71
7 49 LOAD_FAST 1 (a)
52 LOAD_FAST 0 (m)
55 INPLACE_ADD
56 STORE_FAST 1 (a)
8 59 LOAD_FAST 2 (start)
62 STORE_FAST 3 (b)
9 65 JUMP_ABSOLUTE 21
68 JUMP_ABSOLUTE 21
11 >> 71 LOAD_FAST 3 (b)
74 LOAD_CONST 3 (1)
77 INPLACE_ADD
78 STORE_FAST 3 (b)
81 JUMP_ABSOLUTE 21
>> 84 POP_BLOCK
12 >> 85 LOAD_FAST 1 (a)
88 RETURN_VALUE
Now let's look at find_solution
:
>>> dis.dis(find_solution)
2 0 SETUP_LOOP 58 (to 61)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_FAST 0 (step)
9 LOAD_CONST 1 (999999999)
12 LOAD_FAST 0 (step)
15 CALL_FUNCTION 3
18 GET_ITER
>> 19 FOR_ITER 38 (to 60)
22 STORE_DEREF 0 (num)
3 25 LOAD_GLOBAL 1 (all)
28 LOAD_CLOSURE 0 (num)
31 BUILD_TUPLE 1
34 LOAD_CONST 2 (<code object <genexpr> at
0x10027eeb0, file "<stdin>",
line 3>)
37 MAKE_CLOSURE 0
40 LOAD_GLOBAL 2 (check_list)
43 GET_ITER
44 CALL_FUNCTION 1
47 CALL_FUNCTION 1
50 POP_JUMP_IF_FALSE 19
4 53 LOAD_DEREF 0 (num)
56 RETURN_VALUE
57 JUMP_ABSOLUTE 19
>> 60 POP_BLOCK
5 >> 61 LOAD_CONST 0 (None)
64 RETURN_VALUE
Immediately it becomes clear that (a) this code is much less complex, but (b) it also calls three different functions. The first one is simply a single call to xrange
, but the other two calls appear inside the outermost for loop. The first call is the call to all
; the second, I suspect, is the generator expression's next
method being called. But it doesn't really matter what the functions are; what matters is that they are called inside the loop.
Now, you might think "What's the big deal?" here. It's just a function call; a few nanoseconds here or there -- right? But in fact, those nanoseconds add up. Since the outermost loop proceeds through a total of 232792560 / 20 == 11639628
loops, any overhead gets multiplied by more than eleven million. A quick timing using the %timeit
command in ipython
shows that a function call -- all by itself -- costs about 120 nanoseconds on my machine:
>>> def no_call():
... pass
...
>>> def calling():
... no_call()
...
>>> %timeit no_call()
10000000 loops, best of 3: 107 ns per loop
>>> %timeit calling()
1000000 loops, best of 3: 235 ns per loop
So for every function call that appears inside the while loop, that's 120 nanoseconds * 11000000 = 1.32 seconds
more time spent. And if I'm right that the second function call is a call to the generator expression's next
method, then that function is called even more times, once for every iteration through the genex -- probably 3-4 times per loop on average.
Now to test this hypothesis. If function calls are the problem, then eliminating function calls is the solution. Let's see...
def find_solution(step):
for num in xrange(step, 999999999, step):
for n in check_list:
if num % n != 0:
break
else:
return num
return None
Here's a version of find_solution
that does almost exactly what the original does using for/else
syntax. The only function call is the outer one, to xrange
, which shouldn't cause any problems. Now, when I timed the original version, it took 11 seconds:
found an answer: 232792560
took 11.2349967957 seconds
Let's see what this new, improved version manages:
found an answer: 232792560
took 2.12648200989 seconds
That's a hair faster than the performance of your fastest version of ProjectEulerFive
on my machine:
232792560
took 2.40848493576 seconds
And everything makes sense again.