from tqdm import tqdm
from joblib import Parallel, delayed
import numpy as np
import time
def func(x,a,b,c):
"""x3+ax2+bx+c"""
return x**3+a*x**2+b*x+c
def func_prime(x,a,b):
"""3x2+2ax+b"""
return 3*x**2+2*a*x+b
def check(i):
a,b,c = i
if func_prime(-1/4, a,b) != -1/4:
return False
if func_prime(1/4,a,b) >=0:
return False
for i in range(-300,300):
if func(i-1, a,b,c)*func(i+1,a,b,c)<0:
return False
return a,b,c
def gen():
a=np.arange(-1, 1, 1/128)
b=np.arange(-1, 1, 1/128)
c=np.arange(-1, 1, 1/128)
for i_a in a:
for i_b in b:
for i_c in c:
yield i_a, i_b, i_c
# single thread. it spend 11 seconds.
result = set()
for i in tqdm(gen()):
result.update([check(i)])
# using joblib. it spend 313 seconds.
start_time = time.time()
_ = Parallel(n_jobs=10)(delayed(check)(i) for i in gen())
end_time = time.time()
print(end_time - start_time)
I'm tring to solve a math quiz for fun, and solution is "fx=x3-0.375x2-0.625x + 0".
My code test all combinations of values, so I thought parallel programming make it faster. But this is not so, it is much slower than with a single thread.
My questions:
tl;dr: You're iterating in the wrong place.
gen()
produces many values (triples),
which are serialized and sent down pipes to worker sub-processes.
def check(i):
a, b, c = i
...
for i in range(-300, 300):
First, that second i
is the wrong identifier, it should be x
.
Starting out with a tuple and then switching to a scalar is just bizarre,
avoid doing that.
Second, the for
loop may exit early.
Now, if we typically do six hundred iterations,
this could possibly be a sensible parent-child work balance.
But I get the sense that we're spending a lot of
time serializing those 3-tuples, sending them
down a pipe, deserializing them, and quickly
doing an early exit long before hitting six hundred.
You have nested a, b, c loops and ten cores.
I recommend that the parent only loops over a
values,
and lets the child worry about the b, c
iteration.
That way you spend a greater fraction of CPU cycles
on evaluating func
than on serializing / deserializing.
Separate issue:
if func_prime(-0.25, a, b) != -0.25:
It's not usually recommended to test a float for exact equality
(or in this case, inequality).
When comparing x == y
,
prefer to compute whether abs(x - y)
is less than some small epsilon
(or in this case, greater than ϵ ).
We have a special case, here, reciprocal of a power of 2, so IEEE-754 can exactly represent the quantity of interest. But in general, you should develop some healthy level of paranoia about FP rounding issues.