As part of a test suite written in Python 3[.4-.6] on Linux, I have to run a number 3rd-party tests. The 3rd-party tests are bash scripts. They are designed to run with Perl's prove
TAP harness. One bash script can contain up to a few thousands of individual tests - and some of them can hang indefinitely. After a timeout, I want to kill the test script and gather some information on where it got stuck.
Because the bash scripts create processes of their own, I try to isolate the entire prove
process tree into a new process group, so I can eventually kill the entire process group as a whole if things go wrong. Because the tests must run with root privileges, I use sudo -b
for creating a new process group with root privileges. This strategy (as opposed to using setsid
in one way or the other) is a result of the comments I received on this question at SE Unix&Linux
The problem is that I loose all output from the prove
TAP harness if I kill it 'prematurely' when launched with sudo -b
through Python's subprocess.Popen
.
I isolated it into a simple test case. The following is a bash test script named job.t
:
#!/bin/bash
MAXCOUNT=20
echo "1..$MAXCOUNT"
for (( i=1; i<=$MAXCOUNT; i++ ))
do
echo "ok $i"
sleep 1
done
Just for comparison, I also wrote a Python script named job.py
producing more or less the same output and exhibiting the same behavior:
import sys
import time
if __name__ == '__main__':
maxcount = 20
print('1..%d' % maxcount)
for i in range(1, maxcount + 1):
sys.stdout.write('ok %d\n' % i)
time.sleep(1)
Last but not least, the following is my stripped-down "Python test infrastructure" named demo.py
:
import psutil # get it with "pip install psutil"
import os
import signal
import subprocess
def run_demo(cmd, timeout_after_seconds, signal_code):
print('DEMO: %s' % ' '.join(cmd))
proc = subprocess.Popen(cmd, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
try:
outs, errs = proc.communicate(timeout = timeout_after_seconds)
except subprocess.TimeoutExpired:
print('KILLED!')
kill_pid = _get_pid(cmd)
subprocess.Popen(['sudo', 'kill', '-%d' % signal_code, '--', '-%d' % os.getpgid(kill_pid)]).wait()
outs, errs = proc.communicate()
print('Got our/err:', outs.decode('utf-8'), errs.decode('utf-8'))
def _get_pid(cmd_line_list):
for pid in psutil.pids():
proc = psutil.Process(pid)
if cmd_line_list == proc.cmdline():
return proc.pid
raise # TODO some error ...
if __name__ == '__main__':
timeout_sec = 5
# Works, output is captured and eventually printed
run_demo(['sudo', '-b', 'python', 'job.py'], timeout_sec, signal.SIGINT)
# Failes, output is NOT captured (i.e. printed) and therefore lost
run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
When demo.py
is launched, it runs the routine run_demo
twice - with different configurations. Both times, a new process group with root privileges is started. Both times, a "test job" prints a new line (ok [line number]
) once a second - theoretically for 20 seconds / 20 lines. However, there is a timeout of 5 seconds for both scripts and the entire process group is killed after this timeout.
When run_demo
runs for the first time with my little Python script job.py
, all output of that script all the way up to the point when it is killed is captured and printed successfully. When run_demo
runs for the second time with the demo bash test script job.t
on top of prove
, no output is captured and only empty strings are printed.
user@computer:~> python demo.py
DEMO: sudo -b python job.py
KILLED!
Got our/err: 1..20
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
Traceback (most recent call last):
File "job.py", line 11, in <module>
time.sleep(1)
KeyboardInterrupt
DEMO: sudo -b prove -v /full/path/to/job.t
KILLED!
Got our/err:
user@computer:~>
What is happening here and how can I fix it?
I.e. how can I interrupt / terminate the bash test script running with prove
(and its entire process group) in such a manner that I can capture its output?
EDIT: It was suggested in an answer that the observed behavior happens because of Perl buffering its output. Within an individual Perl script, this can be turned off. However, there is no apparent option allowing to switch off buffering for prove
[-v]. How can I achieve this?
I can work around this issue by running my test job with bash
directly. The following command has to be changed from
run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
to
run_demo(['sudo', '-b', 'bash', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
This way, I do not get the test statistics printed by prove
, but I can generate them myself.
By default, the STDOUT of many programs (including perl
) is line-buffered (flushed on newline) when the STDOUT is connected to a terminal, and block-buffered (flushed when the file's buffer is full) otherwise (e.g. when it's connected to a pipe).
You can fool such programs into using line-buffering by using a pseudo-tty (ptty) instead of a pipe. To that end, unbuffer
is your friend. On Ubuntu, this is part of the expect
package (sudo apt install expect
).
From the docs:
unbuffer disables the output buffering that occurs when program output is redirected from non-interactive programs. For example, suppose you are watching the output from a fifo by running it through od and then more.
od -c /tmp/fifo | more
You will not see anything until a full page of output has been produced.
You can disable this automatic buffering as follows:
unbuffer od -c /tmp/fifo | more
I tried your example code and got the same result as you describe (thanks to your Minimal, Complete, and Verifiable example!).
I then changed
run_demo(['sudo', '-b', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
to
run_demo(['sudo', '-b', 'unbuffer', 'prove', '-v', os.path.join(os.getcwd(), 'job.t')], timeout_sec, signal.SIGINT)
That is: I simply prepended unbuffer
to the prove
command. The output then was:
DEMO: sudo -b python job.py
KILLED!
Got our/err: 1..20
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
Traceback (most recent call last):
File "job.py", line 8, in <module>
time.sleep(1)
KeyboardInterrupt
DEMO: sudo -b unbuffer prove -v /home/dirk/w/sam/p/job.t
KILLED!
Got our/err: /home/dirk/w/sam/p/job.t ..
1..20
ok 1
ok 2
ok 3
ok 4
ok 5