I currently have a sequential code with the following parts (all parts are properly encapsulated and isolated in class methods and so on):
VideoCapture
Because this will need to run in real-time (some little constant latency is allowed, but not a growing one), something needs to be done since it currently runs at 15 fp (already ignoring odd frames). Time profiling shows that the most time consuming processes are (2) and (4) (no surprise).
When I started looking up info for (1) I learned about the threading
module which seemed promising and popped up in a lot of stackoverflow answers to increase code speed when image capture from cameras was involved. This led me to see salvation in this module (because of missconceptions of parallelization) until I have just learnt that it still executes one thread thanks to the GIL thing. I am also aware of the existance of asyncio
, multithreading
and concurrent.futures.ProcessPoolExecutor
. I have also read this post on threading. Multicore processing is available.
Aim is to have (1) capture frames into a queue
. (2) takes the frame when there's one available and processes it. As soon as it finishes the processing, give the output to (3) and read the next available frame in the queue
to keep processing while (3) and (4) are executed too. (3) takes (2) output and processes it super fast and waits until there's output from (2) and repeat. Finally, every X frames, (4) will read the outputs generated by (3) until that point and perform some heavy calculations.
If I have understood well, I should use multiprocessing instead of multithreading since it's a rather intensive calculation problem (apart from the I/O on (1)).
threading
combined with queue
a good way to go?I just need someone who knows about all this mess to point me in the right direction, so don't hesitate to add some references. Thank you very much in advance!
It is difficult to provide an authoritative answer in a question like this. I can provide with some experience when working with a similar problem (image processing on a Raspberry Pi on a Drone).
In my experience, multiprocessing is the way to go. Because of the PIL limitations, if you are pushing the limits of a single core, threading will do very little. You would use threading in an environment where you have blocking, but not resource intensive operations, such as IO/network. Also, you want to avoid mixing Threads
and Processes
, especially if you are not confident you know what you are doing.
You essentially want a pipeline of processes, multiple stages, each handling a specific task, running in parallel. In this case, your per frame processing time goes from the sum of your individual stage processing times to the max of your per-stage processing times.
Practically, you want a Process
per stage, with a multiprocessing Queue
between. You want to test each stage separate, passing in/out items in a controlled manner first, then try to integrate them. It is not trivial, but you can do it.
As to having two processes for (2) in parallel, while possible, there are some things to consider:
Queues
published to by (1), and two Queues
for (3) to consume from. You need to make sure (3) is consuming items for the right queue, and emmits them ordered. You may want to pass along some metadata to help with this.