pythonlinuxsubprocessfreebsd

How to kill a subprocess and all of its descendants on timeout in Python?


I have a Python program which launches subprocesses like this:

subprocess.run(cmdline, shell=True)

A subprocess can be anything that can be executed from a shell: Python programs, binaries or scripts in any other language.

cmdline is something like export PYTHONPATH=$PYTHONPATH:/path/to/more/libs; command --foo bar --baz 42.

Some of the subprocesses launch child processes of their own, and theoretically these could again launch child processes. There is no hard-and-fast limit on how many generations of descendant processes there can be. Children of subprocesses are third-party tools and I have no control over how they launch additional processes.

The program needs to run on Linux (currently only Debian-based distros) as well as some FreeBSD derivatives – thus it needs to be portable across various Unix-like OSes, while Windows compatibility will probably not be needed in the foreseeable future. It is meant to be installed via the OS package manager, as it comes with OS configuration files and since everything else on the target system uses that as well. That means I’d prefer not having to use PyPI for my program or for any dependencies.

If a subprocess hangs (possibly because it is waiting for a hung descendant), I want to implement a timeout that kills the subprocess, along with anything that has the subprocess as an ancestor.

Specifying a timeout on subprocess.run() does not work, as only the immediate subprocess gets killed, but its children get adopted by PID 1 and continue to run. Also, since shell=True, the subprocess is the shell and the actual command, being a child of the shell, will happily continue. The latter could be solved by passing proper args and env and skipping the shell, which would kill the actual process for the command, but not any child processes of it.

I then tried to use Popen directly:

with Popen(cmdline, shell=True) as process:
    try:
        stdout, stderr = process.communicate(timeout=timeout)
    except subprocess.TimeoutExpired:
        killtree(process)    # custom function to kill the whole process tree

Except, in order to write killtree(), I need to list all child processes of process and recursively do the same for each child process, then kill these processes one by one. os.kill() provides a way to kill any process, given its PID, or send it a signal of my choice. However, Popen does not provide any way to enumerate child processes, nor am I aware of any other way.

Some other answers suggest psutil, but that requires me to install PyPI packages, which I would like to avoid.

TL;DR: Given the above constraints, is there any way to launch a process from Python, with a timeout that kills the entire tree of processes, from said process down to its last descendant?


Solution

  • A quick-and-dirty solution which works on Linux and FreeBSD:

    with Popen(cmdline, shell=True) as process:
        try:
            stdout, stderr = process.communicate(timeout=timeout)
        except subprocess.TimeoutExpired:
            killtree(process.pid)
            process.wait()
    
    def killtree(pid):
        args = ['pgrep', '-P', str(pid)]
        try:
            for child_pid in subprocess.check_output(args).decode("utf8").splitlines():
                killtree(child_pid)
        except subprocess.CalledProcessError as e:
            if e.returncode != 1:
                print("Error: pgrep exited with code {0}".format(e.returncode))
        os.kill(int(pid), signal.SIGKILL)
    

    We just rely on the command line tools of the OS – both Linux and FreeBSD provide pgrep -P PID to enumerate all children of a given process. If there are no child processes, pgrep exits with return code 1, raising a CalledProcessError which we need to handle.

    Relying on an external tool to kill a child process may not be the nicest solution, but it works and the codebase in question already relies on external command line tools to do its job, thus relying on pgrep doesn’t make it significantly worse than it is.

    As in subprocess.run(), we need to call process.wait() on the child process after we kill it. (That goes for POSIX – on Windows we would call process.communicate() instead, but this snippet is likely not compatible with Windows anyway.)