I have a python script which uses multiprocessing and subprocess to launch multiple external commands in parallel with different arguments. The code can be found here.
For convenience I launch this script inside a GNU Screen session. The machine where this script is running has 12 processors which are idle until processes become active.
Each of the processes takes between a few hours to a couple of days to run hence I often disconnect from the machine and detach the screen session.
However, recently I've noticed a behavior which I never experienced before. On several occasions I've returned to the machine to find it idle with a load of zero. If I get a list of active processes either via ps ux
or top
I can still find the script (and the subprocesses) on the list of processes.
I then reattach the screen session to check the state of the program and immediately a new batch of processes is sent to the queue and the load of the system goes back to 12 in a matter of seconds. Note that I did absolutely nothing to the script other than reattaching the screen session.
I've installed a monitoring tool on the system and what happens is that some processes finish after a certain time and no new processes are launched. So the system is active until subprocesses are busy and becomes idle as soon as no more jobs are released from the queue.
So my question is, does anyone know of any reason that explains this behavior?
EDIT: After a year or so, this problem is no longer reproducible, either some patch on screen or python itself. I'm accepting the answer as it provided good directions for testing.
I can't explain the reason for what you are seeing. However, I do have an idea of what you can try next.
Please comment back with the results of these tests. That will give me more to go on.