pythonmatplotlibpython-multiprocessingpython-multithreadingmultiprocessing-manager

Python IPC with matplotlib


Project description:

Connect existing "C" program (main control) to Python GUI/Widget. For this I'm using a FIFO. The C program is designed look at frame based telemetry.

The Python GUI performs two functions:

  1. Runs/creates plots (probably created through matplotlib) via GUI widget as the user desires (individual .py files, scripts written by different users)
  2. Relays the frame number to the python plotting scripts after they have been created so they can "update" themselves after being given the frame number from the master program.

I have several questions--understanding the pros and cons from multi-processing versus multi-threading seen here: Multiprocessing vs Threading Python

Implementation Considerations:

  1. Having too many plots created via threads in signal based architecture probably becomes laggy in terms of updating them I'm guessing. I'm not sure when they become CPU bound...most plots will update a few line series, some may update images. Perhaps it will be laggy regardless of which way I choose to do this regardless of creation method.

  2. I'm not sure what opening 30 python processes, where each process makes a plot or two with matplotlib does to a machine or its resources. I see a single simple matplotlib plot on my system has an RSS (allocated memory) of 117M, so I don't think a single user plotting 30 plots would limit system memory if done by opening separate processes for each plot. (16 GB, 32-core Linux Boxes with several simultaneous users)

Questions:

  1. Should I open the plots via threads or processes and will one be less laggy than the other?
  2. If I use threads does anyone have any idea how many matplotlib figures it will take to update before it gets laggy on a single thread?
  3. If I create the plots as processes, should I use the multiprocessing package? I'm guessing this API makes it straight forward to communicate the frame number between processes?
  4. Given I have multi-processing available, it's probably silly to try to open processes through POpen right? I'm guessing this is the case because if I did this I would have to setup all of the piping/IPC myself which would be more work?

Solution

  • So I was able to implement this project in two ways--with and without multiprocess.

    1. I have a main process in a PyQt GUI with a thread which reads from the pipe of the controlling C program's frame number.
    2. When the user selects plots (.py scripts) have the option to press the "execute" button on a batch of plots which keeps them in the main process. From this point if the frame is updated the plots will be updated serially. Slow down begins to occur almost immediately past a handful of plots but is not prohibitive for 10-20 simple time-series plots.
    3. There is an alternative button which allows processing with another process. I was able to do this with POpen and a named pipe or multiprocessing and a multiprocessing Queue. The cleanest way to do this was to make my other processes which create the plots QObjects and use pyqt signals where each of the other processes ended by creating a QApplications in that process, but I had to use ctx = mp.get_context('spawn') on Linux because by default Linux uses a fork and when I created the QApplication it believes the QApplication was already running in the main process. This was the only way I was able to get predictable multprocessing behavior where all of the matplotlib plots would update in the alternative process.

    I read matplotlib is not thread-safe on the web, however, with pyqt slots emitting from the threads waiting for the queue reads this seems to be fine.

    I chose the implementation to give the user the flexibility for opening plots in the same process or batches of plots in another process rather than predetermined amounts of plots per process thinking that there could be certain plots with complex updates which could be created and those would deserve their own process and could be selected as such. This was also less wasteful than a plot per process for simple plots @ 100MB minimum per process with only 3MB or so of additional required memory for each additional plot in the same process.

    One last detail was the user switches the frame quite rapidly potentially. I had the receive process read and empty the queue in a non-blocking daemon thread and grab only the latest information. Once a signal was sent to update the plots a thread lock was grabbed by the plot update loop and the read daemon is again able to emit updates after the update method released the thread lock.

    Some sample code of the basic idea of the implementation: https://stackoverflow.com/a/49226785/8209352