pythonredirectstandardoutput

Weird python behavior when redirecting output to file (when using os.system)


I got a problem that I think related to the low level of how the redirecting output (operator >) and os.system work

PROBLEM
This is my code. I want to get the 2 last line of bunch of text file. It works fine, but how the output is organized confused me

import os

# Get list directory within the root directory
root_list_dir = [directory for directory in os.listdir(".") if  os.path.isdir(directory)]

for directory in root_list_dir:
    log_path = directory + "/acoustic_model/log/"
    log_list = os.listdir(log_path)

    for i in log_list:
        print("==============================")
        os.system("tail -n 2 " + log_path + "" + i)

Standard output when I called python my_script.py (It worked as my expectation. I cut off most of the output since it's redundant):

==============================
2018-03-09 00:06:46,594     INFO           main: Develop: DNN -- MCD: 4.891 dB; BAP: 0.185 dB; F0:- RMSE: 26.193 Hz; CORR: 0.794; VUV: 6.372%
2018-03-09 00:06:46,594     INFO           main: Test   : DNN -- MCD: 4.721 dB; BAP: 0.163 dB; F0:- RMSE: 22.119 Hz; CORR: 0.828; VUV: 6.052%
==============================
2018-03-23 17:08:31,564     INFO         labels: loaded /data/thinhnv20/merlin-master/egs/build_your_own_voice/s1/experiments/doanngocle_cutaudio_3k5/test_synthesis/gen-lab/TTS8.lab, 84552 labels
2018-03-23 17:08:32,276     INFO         labels: loaded /data/thinhnv20/merlin-master/egs/build_your_own_voice/s1/experiments/doanngocle_cutaudio_3k5/test_synthesis/gen-lab/TTS7.lab, 82456 labels
==============================
2018-05-18 21:05:26,605     INFO           main: reconstructing waveform(s)
2018-05-18 21:05:26,605     INFO wav_generation: creating waveform for    1 of    1: labspit1
==============================
2018-05-18 21:57:58,600     INFO wav_generation: creating waveform for    1 of    2: labspit1
2018-05-18 21:58:25,779     INFO wav_generation: creating waveform for    2 of    2: labspit2
==============================
2018-05-05 03:06:25,523     INFO           main: Develop: DNN -- MCD: 4.981 dB; BAP: 0.187 dB; F0:- RMSE: 27.926 Hz; CORR: 0.766; VUV: 6.610%
2018-05-05 03:06:25,524     INFO           main: Test   : DNN -- MCD: 4.832 dB; BAP: 0.166 dB; F0:- RMSE: 23.796 Hz; CORR: 0.797; VUV: 6.385%
==============================
2018-05-18 22:04:01,253     INFO wav_generation: creating waveform for    2 of    3: labspit1
2018-05-18 22:04:28,330     INFO wav_generation: creating waveform for    3 of    3: labspit2
==============================
2018-03-10 20:50:44,003     INFO           main: Develop: DNN -- MCD: 4.892 dB; BAP: 0.182 dB; F0:- RMSE: 26.413 Hz; CORR: 0.790; VUV: 6.383%
2018-03-10 20:50:44,003     INFO           main: Test   : DNN -- MCD: 4.717 dB; BAP: 0.161 dB; F0:- RMSE: 21.953 Hz; CORR: 0.832; VUV: 6.134%
==============================
2018-03-13 09:23:05,873     INFO wav_generation: creating waveform for   31 of   32: TTS8
2018-03-13 09:23:08,908     INFO wav_generation: creating waveform for   32 of   32: TTS9

But when I redirected the output to a file (by calling python my_script.py > results), the =============== part is always print at the end of the results:

2018-03-13 14:57:24,852     INFO wav_generation: creating waveform for   31 of   32: TTS8
2018-03-13 14:57:27,999     INFO wav_generation: creating waveform for   32 of   32: TTS9
2018-03-13 15:09:45,300     INFO wav_generation: creating waveform for   31 of   32: TTS8
2018-03-13 15:09:48,499     INFO wav_generation: creating waveform for   32 of   32: TTS9
2018-03-13 14:25:32,699     INFO           main: Develop: DNN -- MCD: 4.875 dB; BAP: 0.183 dB; F0:- RMSE: 26.291 Hz; CORR: 0.792; VUV: 6.487%
2018-03-13 14:25:32,700     INFO           main: Test   : DNN -- MCD: 4.716 dB; BAP: 0.162 dB; F0:- RMSE: 22.129 Hz; CORR: 0.830; VUV: 6.132%
==============================
==============================
==============================
==============================
==============================
==============================
==============================
==============================
==============================
==============================

Can anyone explain to me something wrong here? I would love to hear any information! Thanks in advance


Solution

  • You've got two things writing to stdout, both with their own buffers.

    This is one of the many reasons you usually don't want to use os.system.

    You can force Python to flush it output after each print just by using the flush=True argument. Or you can even open(sys.stdout.fileno(), 'wb', buffering=0) and then write your own bytes to that to disable buffering.

    But that may or may not be sufficient, because you still won't have any control over what os.system writes. If you want to make sure its output is either unbuffered, or shares the same buffers as Python, there's no way around that except using subprocess. As the docs for os.system said to do in the first place.

    So:

    for i in log_list:
        print("==============================")
        proc = subprocess.run(['tail', '-n', '2', log_path + i], stdout=subprocess.PIPE)
        print(proc.stdout)