I ran the following program through a Slurm job:
import datetime
print(f'Program started at {datetime.datetime.now()}.')
# More imports here.
print(f'Modules finished loading at {datetime.datetime.now()}.')
seed_everything(7253, workers=True)
seed_everything
is defind at PyTorch Lightning and prints using the logging
module.
The program produced the following log file:
[rank: 0] Seed set to 7253
Program started at 2024-08-15 16:20:52.034715.
Modules finished loading at 2024-08-15 16:21:01.970554.
All lines got printed to the log file at around the same time, at 16:21:04.
Why wasn't the "Program started" line printed at 16:20:52, when print
was called?
And how did the "Seed set" line manage to get printed first?
The logging module writes to the log file immediately. Since PyTorch is using the logging module for its messages, these are written at the time they're logged.
On the other hand, when standard output of the script is piped to Slurm, this output is fully buffered. Slurm will only write it to the log file when the buffer is flushed. If the script only produces a small amount of output, this won't happen until the script exits.
On Linux you may be able to use the unbuffer
command to run the script from Slurm. This redirects the script's output through a pty, so it will be line-buffered instead of fully-buffered.