pythonlinuxzombie-processdefunct

How to cleanly kill subprocesses in python


We are using a python process to manage long running python subprocesses. Subprocesses occasionally need to be killed. The kill command does not completely kill the process, only makes it defunct.

Running the following script demonstrates this behaviour.

import subprocess
p = subprocess.Popen(['sleep', '400'], stdout=subprocess.PIPE, shell=False)

or

p = subprocess.Popen('sleep 400', stdout=subprocess.PIPE, shell=True)

Will create a subprocess.

p.terminate() 
p.kill()

does nothing to the process. Demonstrated by ps aux | grep sleep

$ ps aux| grep 'sleep'
User       8062  0.0  0.0   7292   764 pts/7    S    14:53   0:00 sleep 400

The process has not been killed/made defunct. Using the subprocess.call() function with 'kill' and pid as arguments will issue the kill command.

subprocess.call(['kill', str(p.pid)])

This will kill the process but it is now defunct.

$ ps aux | grep 'sleep'
User       8062  0.0  0.0      0     0 pts/7    Z+   14:51   0:00 [sleep] <defunct>

If the queue is running long enough will it eventually reach its maximum number of processes, or will it eventually reap the defunct processes and be fine?

If the answer is the former, how can I handle defunct processes in python without killing the parent process?

Is there a better way of killing processes?


Solution

  • There are 2 main issues here:

    First issue: If you're using shell=True, so you're killing the shell running the process, not the process itself. With its parent killed, the child process goes defunct / isn't killed immediately.

    In your case, you're using sleep which is not built-in, so you could drop shell=True, and Popen would yield the actual process id: p.terminate() would work.

    You can (and you should) avoid shell=True most of the time, even if it requires extra python coding effort (piping 2 commands together, redirecting input/output, all those cases can be nicely handled by one or several Popen without shell=True.

    And (second issue) if the process is still defunct when terminating after that fix, you could call p.wait() (from this question). Seems that calling terminate isn't enough. The Popen object needs to be garbage collected.