pythonsubprocessttycontrol-characterspython-module-unicodedata

Capture output including control characters of subprocess


I have the following simple program to run a subprocess and tee its output to both stdout and some buffer

import subprocess
import sys
import time

import unicodedata

p = subprocess.Popen(
    "top",
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

stdout_parts = []
while p.poll() is None:
    for bytes in iter(p.stdout.readline, b''):
        stdout_parts.append(bytes)
        str = bytes.decode("utf-8")
        sys.stdout.write(str)
        for ch in str:
            if unicodedata.category(ch)[0]=="C" and ord(ch) != 10:
                raise Exception(f"control character! {ord(ch)}")
    time.sleep(0.01)

When running some terminal updating program, such as top or docker pull, I want to be able to catch its entire output as well, even if it is not immediately readable as such.

Reading around How do commands like top update output without appending in the console? for example, it seems it is achieved by control characters. However, I don't receive any of them when reading lines from the process output streams (stdout/stderr). Or is the technology they use different and I cannot catch it from the subprocess?


Solution

  • Many tools adapt their output depending on whether or not they are connected to a terminal. If you want to receive exactly the output you see when running the tool interactively in a terminal, use a wrapper such as pexpect to emulate this behavior. (There is also a low-level pty library but this is tricky to use, especially if you are new to the problem space.)

    Some tools also allow you to specify a batch operation mode for scripting; maybe look into top -b (though this is not available e.g. on MacOS).

    For the record, many screen control sequences do not consist entirely or even mainly of control characters; for example, the control sequence to move the cursor to a particular position in curses start with an escape character (0x1B), but otherwise consists of regular printable characters. If you really want to process these sequences, probably look into using a curses / ANSI control code parsing library. But for most purposes, a better approach is to use a machine-readable API and disable screen updates entirely. On Linux, a lot of machine-readable information is available from the /proc pseudo-filesystem.