In our numerical software I encountered a strange bug after upgrading our cluster. It namely is:
At line 501 of file /home/weser/code/neci/src/fcimc_helper.F90 (unit = 6, file = '/dev/null')
Fortran runtime error: End of record
In this line there is a print *,
statement that prints to stdout.
In our program the STDOUT of all non-root MPI processes is closed and reopened to write to /dev/null
.
(Except in Debug mode, then the STDOUT of every non-root MPI process is redirected to a separate file.)
I tried to create a minimal example for this problem which looks like this:
program stdout_to_dev_null
use iso_fortran_env, only: stdout => output_unit
use mpi_f08 ! also works with plain mpi
implicit none(type, external)
integer :: rank, n_procs, ierror
integer, parameter :: root = 0
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, n_procs, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
if (rank /= root) then
close(stdout, status="keep")
open(stdout, file="/dev/null", recl=8192)
end if
write(stdout, *) 'Size is ', n_procs
write(stdout, *) 'node', rank, ': Hello world'
block
integer :: i
character(:), allocatable :: large_string
allocate(character(len=5000) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
write(stdout, *) large_string
end block
call MPI_FINALIZE(ierror)
end program
The problem is that this minimal example works completely as expected, when run manually using mpirun
, but also when actually sent to the cluster like other heavy calculations.
Now I have three questions: Do I have undefined behaviour in such code, when closing and reopening STDOUT and I am simply lucky in the minimal example? How can there be an End of Record in /dev/null
? How can I properly fix this problem?
The problem has nothing to do with MPI and also nothing to do with the difference in cluster.¹
It is problematic code, that fails with gfortran
but works under ifort
by pure luck.
If the file is opened with a fixed record length (recl=...
) a write statement must not exceed this length, even if the output goes to /dev/null
.
The fix is simply to not open with a fixed record length and omit the recl=...
argument.
Apparently the runtime library of ifort
is more permissive and even works if the byte length of the written object is larger than the record length specified in the open
statement.
In the following example the last write
statement fails under gfortran
.
program stdout_to_dev_null
use iso_fortran_env, only: stdout => output_unit
implicit none(type, external)
integer, parameter :: rec_length = 10
write(stdout, *) 'asdf'
close(stdout, status="keep")
open(stdout, file="/dev/null")
block
integer :: i
character(:), allocatable :: large_string
allocate(character(len=rec_length - 1) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
write(stdout, *) large_string
deallocate(large_string)
allocate(character(len=rec_length + 1) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
write(stdout, *) large_string
end block
close(stdout, status="keep")
open(stdout, file="/dev/null", recl=rec_length)
block
integer :: i
character(:), allocatable :: large_string
allocate(character(len=rec_length - 1) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
write(stdout, *) large_string
deallocate(large_string)
allocate(character(len=rec_length + 1) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
! The following statement fails
write(stdout, *) large_string
end block
close(stdout, status="keep")
end program
¹ The relevant difference between the old and the new cluster for this problem is that we use gfortran + OpenMPI
on the new one and ifort + IntelMPI
on the old one.