I saw this code posted somewhere and was having trouble understanding how it could possibly work properly:
out_q = Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultdict = {}
for i in range(nprocs):
resultdict.update(out_q.get())
time.sleep(5)
# Wait for all worker processes to finish
for p in procs:
p.join()
print resultdict
time.sleep(15)
It seems to me that it would make sense to wait for all processes to terminate before querying the Queue for their output. How can one be certain that, in querying the Queue immediately after starting all processes, the Queue will contain all the outputs? (i.e. what happens if the amount of time it takes for a worker to complete is relatively longer than the amount of time it takes to start all processes and then begin looking at the Queue)
Another slightly related question: the Python documentation says that "A process can be joined many times." Why would one want to join the process multiple times? If it has already terminated, what would be the purpose of checking that it has terminated again?
It seems to me that it would make sense to wait for all processes to terminate before querying the Queue for their output.
True, it works this way.
How can one be certain that, in querying the Queue immediately after starting all processes, the Queue will contain all the outputs?
It will wait until the last process works.
Why would one want to join the process multiple times?
Sometimes we need to run a process more times, like if we want to update a variable again and again with other or with same parameters and say we have a lot of time until the "slowest" process isn't be done.