fortranopenmpintel-fortran

Weird omp_get_wtime() behavior on cluster (ifort)


I am running the same fortran code on my local windows machine and a cluster.

A mwe of the code would be:

program mwe

use omp_lib    
implicit none
    


integer :: indx
integer :: indy
integer :: indz
integer :: total_z = 1000000
integer :: total_x = 200
integer :: total_y = 100

real, allocatable :: output(:,:), grid_x(:), grid_y(:)
real :: startTime, midTime, totalTime

allocate(output(total_x, total_y))
allocate(grid_x(total_x))
allocate(grid_y(total_y))



do indx = 1, total_x
    grid_x(indx) = indx * 2
end do 

do indy = 1, total_y
    grid_y(indy) = indy / 3
end do 


output = 0.0

startTime = omp_get_wtime()
do indz = 1,total_z
do indy = 1,total_y
!$omp parallel do default(private) shared(indy, total_y, indz, total_z, total_x, output, startTime)    
do indx = 1,total_x  
    
    output(indx,indy) = output(indx,indy) + grid_x(indx)/grid_y(indy) 
    if ((mod(indz, 100000) .eq. 0) .and. (indx .eq. 1) .and. (indy .eq. 1)) then 
            midTime = omp_get_wtime()
            write(*,"(A,F15.1,A)") "Total time elapsed",midTime-startTime," seconds." 
    end if 
    
    
    
end do
!$omp end parallel do  
end do
end do

    
end program

On windows I am compiling the code as:

ifort /O2 /c /Qopenmp main.f90
ifort /o main.out /O2 /Qopenmp main.f90 /link /STACK:999999999,999999999

main.out

And the output is:

Total time elapsed 119.4 seconds.

Total time elapsed 237.8 seconds.

Total time elapsed 357.7 seconds.

Total time elapsed 474.1 seconds.

Total time elapsed 588.7 seconds.

Total time elapsed 730.1 seconds.

Total time elapsed 873.0 seconds.

Total time elapsed 1015.4 seconds.

Total time elapsed 1159.6 seconds.

On the cluster (linux) I am compiling the code as:

#!/bin/bash
#SBATCH -e error_file.err
#SBATCH -p common
#SBATCH -c 40
#SBATCH --job-name==mwe
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
export OMP_STACKSIZE=512mb
source /opt/apps/rhel8/intel-2020/compilers_and_libraries/linux/bin/compilervars.sh intel64

ifort -O2 -c -qopenmp main.f90
ifort -o main.out -O2 -qopenmp main.f90

./main.out

and the output is:

Total time elapsed 0.0 seconds.

Total time elapsed 0.0 seconds.

Total time elapsed 128.0 seconds.

Total time elapsed 128.0 seconds.

Total time elapsed 128.0 seconds.

Total time elapsed 256.0 seconds.

Total time elapsed 256.0 seconds.

Total time elapsed 384.0 seconds.

Total time elapsed 384.0 seconds.

Total time elapsed 384.0 seconds.


Solution

  • You are computing a difference of two large numbers. But with the default real, you do not get sufficient precision.

    Compare

    Total time elapsed   1717144796.8   1717144796.7 seconds.
    Total time elapsed   1717144797.0   1717144796.7 seconds.
    Total time elapsed   1717144797.1   1717144796.7 seconds.
    Total time elapsed   1717144797.2   1717144796.7 seconds.
    Total time elapsed   1717144797.4   1717144796.7 seconds.
    Total time elapsed   1717144797.5   1717144796.7 seconds.
    Total time elapsed   1717144797.6   1717144796.7 seconds.
    Total time elapsed   1717144797.8   1717144796.7 seconds.
    Total time elapsed   1717144797.9   1717144796.7 seconds.
    Total time elapsed   1717144798.1   1717144796.7 seconds.
    Total time elapsed   1717144798.2   1717144796.7 seconds.
    Total time elapsed   1717144798.3   1717144796.7 seconds.
    Total time elapsed   1717144798.5   1717144796.7 seconds.
    Total time elapsed   1717144798.6   1717144796.7 seconds.
    Total time elapsed   1717144798.7   1717144796.7 seconds.
    Total time elapsed   1717144798.9   1717144796.7 seconds.
    

    with double precision and

    Total time elapsed   1717144832.0   1717144832.0 seconds.
    Total time elapsed   1717144832.0   1717144832.0 seconds.
    Total time elapsed   1717144832.0   1717144832.0 seconds.
    Total time elapsed   1717144832.0   1717144832.0 seconds.
    Total time elapsed   1717144832.0   1717144832.0 seconds.
    Total time elapsed   1717144832.0   1717144832.0 seconds.
    

    with real, for a modified code

    write(*,"(A,F15.1,F15.1,A)") "Total time elapsed",midTime,startTime," seconds." 
    

    You need sufficient precision to store the floating point number times. The timers are specific to different compiler suites on different operating systems and will differ. I get different results with Intel and GCC. If you get larger or smaller starting times you will see smaller or larger effective resolution when using real.

    The actual precision of the timer is given by omp_get_wtick() but is only achievable when using high enough kind of a real to store the time.