[SOLVED] Different result when passing an entry and the same entry which is inside a matrix in Fortran

The TLDR:

I have a Fortran function, shiftxe_pos that acts on an element fe(i,j,:,:) with

    integer, parameter :: dp = kind(1.0d0)
    integer, parameter :: nx
    integer, parameter :: nv
    integer, parameter :: dim
    real(dp), dimension(dim, dim, -1:nx, nv) :: fe

It does not behave identically if I input to it fe(1,1,:,:) when dim=1 than when dim>1, although fe(1,1,:,:) is identical. Do you know anything that could imply such a behavior, especially in FORTRAN ?

More details for non-Fortran users

dp is the precision (double precision here), and fe is a 4D array. The notation fe(i,j,:,:) represents a 2D array for all fe(i,j,k,l) for all k between -1 and nx and all l between 1 and nv (included). An important fact is that when dim=1 fe itself collapses to a 2D array since the two first dimensions are just 1.

Now some further details

on a minimal problem I worked on. The main where shiftxe_pos is called is simply

program main
    use commonblock
    use Poisson_bracket
    implicit none
    call inicon
    fe_temp(1,1,:,:) = fe(1,1,:,:)
    call shiftxe_pos(fe(1,1,:,:), fe_temp(1,1,:,:))
end program main

it calls a commonblock initiating the parameters and distributions fe(i,j,:,:) as some standard normal distribution in x and v

module commonblock
    implicit none
    integer, parameter :: dp = kind(1.0d0)
    real(dp), parameter :: pi = 4.0_dp*atan(1.0_dp)
    integer :: i, j, k, l
    ! space time resolution
    integer, parameter :: nx = 200
    real(dp), parameter :: xl = 5.0_dp ! box size (-xl < x < xl)
    integer, parameter :: nv = 512
    real(dp), parameter :: vmax = 10.0_dp ! maximum momentum (-vmax < v < vmax)
    integer, parameter :: nt = 1000000
    real(dp), parameter :: dt= 0.001_dp ! time-step
    real(dp) :: dx, dv, dtsdv, dtsdx ! Derived grid spacings
    real(dp), dimension(-1:nx) :: x
    real(dp), dimension(nv) :: v

    integer, parameter :: dim = 2 
    real(dp), dimension(dim, dim, -1:nx, nv) :: fe = 0.0_dp
    real(dp), dimension(dim, dim, -1:nx, nv) :: fe_temp = 0.0_dp
    real(dp) :: femax, femin ! maximum and minimum of fe
    integer, parameter :: part_num = 1 ! normalization factor for fe

contains


subroutine inicon      
    implicit none
    real(dp) :: vshift = 0.0_dp
    real(dp) :: xshift = 0.0_dp
    real(dp) :: sve    = 1.0_dp
    real(dp) :: sxe    = 1.0_dp
!    Define the mesh
    dv = 2.0_dp*vmax/float(nv)
    dx = 2.0_dp*xl/float(nx) 
    dtsdv = dt/dv
    dtsdx = dt/dx
    do i=-1,nx
        x(i) = -xl + float(i)*dx 
    enddo
    do j=1,nv
        v(j) = -vmax + float(j)*dv -0.5_dp*dv
    enddo
    do i = 0, nx
        do j = 1, nv
            fe(1, 1, i, j) = exp(-0.5_dp*((v(j)-vshift)/sve)**2) * exp(-0.5_dp*((x(i)-xshift)/sxe)**2)
        enddo
    enddo

end subroutine inicon
end module commonblock

the Poisson_bracket module, used in main to call a shiftxe_pos routine (a simple advection) is simplified as such

module Poisson_bracket
    use commonblock
    implicit none
contains

subroutine shiftxe_pos(fe_out, fe_local)
! Advances the electron distribution in x space using flux balance method
    use commonblock
    implicit none
    real(dp), dimension(-1:nx, nv), intent(in) :: fe_local
    real(dp), dimension(-1:nx, nv), intent(inout) :: fe_out
    real(dp), dimension(-2:nx+2) :: y = 0.0_dp
    real(dp), dimension(-1:nx+2) :: dm = 0.0_dp
    real(dp) :: sh, coeff
    integer :: ii, locali, localj

!...................some things..........................!

    write(*,*) abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2))
end subroutine shiftxe_pos


end module Poisson_bracket

I choose to share it as a blackbox to not overwhelm the reader. A full version is given below.

The problem

The point is that abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2)), the ratio for an entry of the 2D array fe(1,1,:,:) at the point (nx/8,nv/2) between what was inputed to the function and how it modified it, is 1.0000000000 when you pass fe(1,1,:,:) in argument and dim=1 but 1.0000075...etc... when dim>1. Although the 2D arrays fe(1,1,:,:) are identical in both cases, and thus those ratios should be identitcal. The point (nx/8,nv/2) is chosen as a random one and has no particular feature. I only focus on fe(1,1,:,:) because it exists also when dim=1. It is important for my work that the ratio abs(fe_out(a point)/fe_local(the same point)) is a trustworthy quantity.

More details for non-Fortran users

An intent(in) argument is an argument passed in entry of a subroutine that won't be modified, while an intent(inout) is passed as an argument and modified inside the subroutine. You may wonder why using an intent(in) in here; one can do without it in this minimal exemple but I need this in my work as the second and first entry of shifxe_pos could be different.

Now the full shifxe_pos function if you are interested in it.

subroutine shiftxe_pos(fe_out, fe_local)
! Advances the electron distribution in x space using flux balance method
    use commonblock
    implicit none
    real(dp), dimension(-1:nx, nv), intent(in) :: fe_local
    real(dp), dimension(-1:nx, nv), intent(inout) :: fe_out
    real(dp), dimension(-2:nx+2) :: y = 0.0_dp
    real(dp), dimension(-1:nx+2) :: dm = 0.0_dp
    real(dp) :: sh, coeff
    integer :: ii, locali, localj
    femax = maxval(fe_out) 
    femin = minval(fe_out) 

    ! Forward shift (localj = nv/2+1 to nv)
    do localj = nv/2+1, nv
        sh = v(localj) * dtsdx
        ii = int(sh)
        sh = sh - float(ii)
        coeff = 0.5_dp * (1.0_dp - sh)
        do locali=-2, ii-1
            y(i)=0.0_dp
        end do
        do locali = ii, nx
            y(locali) = fe_out(locali - ii, localj)
        end do
        if (ii /= 0) then
            y(nx + 1) = fe_out(nx + 1 - ii, localj)
        else
            y(nx + 1) = fe_out(nx, localj)
        end if

        do locali = 0, nx
        if (y(locali + 1) - y(locali) > 0.0_dp) then
            if (y(locali + 1) - 3.0_dp * y(locali) < 0.0_dp) then
                dm(locali) = sh * (y(locali) + coeff * (y(locali + 1) - y(locali)))
            else
                dm(locali) = sh * (y(locali) + 2.0_dp * coeff * y(locali))
            end if
        else
            if (y(locali + 1) - y(locali) > 2.0_dp * (y(locali) - femax)) then
                dm(locali) = sh * (y(locali) + coeff * (y(locali + 1) - y(locali)))
            else
                dm(locali) = sh * (y(locali) + 2.0_dp * coeff * (y(locali) - femax))
            end if
        end if
        end do
        dm(-1) = 0.0_dp 
        do locali = 0, nx
            fe_out(locali, localj) = y(locali) + dm(locali - 1) - dm(locali)
        end do
    end do
    ! Backward shift (localj = 1 to nv/2)
    do localj = 1, nv / 2
        sh = -v(localj) * dtsdx
        ii = int(sh)
        sh = sh - float(ii)
        coeff = -0.5_dp * (1.0_dp - sh)
        do locali = 0, nx - ii
            y(locali) = fe_out(locali + ii, localj)
        end do
        if (ii /= 0) then
            y(-1) = fe_out(-1 + ii, localj)
        else
            y(-1) = fe_out(0, localj)
        end if
        do locali=nx+1, nx+2
            y(locali) = 0.0_dp
        end do

        do locali = 0, nx
        if (y(locali) - y(locali - 1) > 0.0_dp) then
            if (y(locali) - y(locali - 1) < 2.0_dp * (femax - y(locali))) then
                dm(locali) = sh * (y(locali) + coeff * (y(locali) - y(locali - 1)))
            else
                dm(locali) = sh * (y(locali) + 2.0_dp * coeff * (femax - y(locali)))
            end if
        else
            if (3.0_dp * y(locali) - y(locali - 1) > 0.0_dp) then
                dm(locali) = sh * (y(locali) + coeff * (y(locali) - y(locali - 1)))
            else
                dm(locali) = sh * (y(locali) - 2.0_dp * coeff * y(locali))
            end if
        end if
        end do
        dm(nx+1) = 0.0_dp 
        do locali = 0, nx
            fe_out(locali, localj) = y(locali) + dm(locali + 1) - dm(locali)
        end do
    end do
    fe_out(-1, 1:nv) = 0.0_dp
    write(*,*) abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2))
end subroutine shiftxe_pos

What I tried

I expected to not have this problem by changing integer, parameter :: dp = kind(1.0d0) to integer, parameter :: dp = selected_real_kind(p=33, r=4931) increasing the precision. It does not change the problem, and abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2)) is kept exatly the same (with more decimals of course). I tried reducing dt and dx, dv the grid parameter. smaller dt changes nothing to the problem, and although smaller dx and dv change the ratio a little bit, it is clearly not sufficient. I tried compiling it without approximations with a command like

gfortran -O2 -fno-fast-math -fno-unsafe-math-optimizations -ffloat-store -ffp-contract=off -frounding-math -fno-tree-vectorize -fno-inline .\commonblock.f90 .\Poisson_bracket.f90 .\main.f90

Nothing changed. I used to call

shiftxe_pos(fe(m,n,:,:), fe(m,n,:,:))

this possibly created undefined behavior because shiftxe_pos would use the same argument for intent(in) and intent(inout) (see above)

    real(dp), dimension(-1:nx, nv), intent(in) :: fe_local
    real(dp), dimension(-1:nx, nv), intent(inout) :: fe_out

I solved this by using an auxiliary variable fe_temp to copy the content of fe and pass it as intent(in).

It did not help in changing the ratio

abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2))

back to exactly one when dim > 1 thus it was not the cause of the problem.

I tried a simpler version of the advection:

subroutine shiftxe(fe_out, fe_local)
    use commonblock
    implicit none
    real(dp), dimension(-1:nx, nv), intent(in) :: fe_local
    real(dp), dimension(-1:nx, nv), intent(inout) :: fe_out
    real(dp), dimension(-2:nx+2) :: y = 0.0_dp
    real(dp), dimension(-1:nx+2) :: dm = 0.0_dp
    real(dp) :: sh, a
    integer :: ii, locali, localj
    do localj=nv/2+1,nv
        sh = v(localj)*dtsdx
        ii = int(sh)
        sh = sh-float(ii)
        a = 0.25_dp*(1.0_dp-sh)
        do locali=-2,ii-1
            y(locali) = 0.0_dp       
        end do
        do locali=ii,nx 
            y(locali) = fe_out(locali-ii,localj)
        end do
        locali = nx+1 
        if (ii.ne.0) y(locali) = fe_out(locali-ii,localj) 
        if (ii.eq.0) y(locali) = fe_out(nx,localj)      
        do i=0,nx 
            dm(locali) = sh*( y(locali) + a*(y(locali+1)-y(locali-1))) 
        end do
            dm(-1) = 0.0_dp  
        do i=0,nx
            fe_out(locali,localj) =  y(locali) + dm(locali-1) - dm(locali)  
        end do
    end do

    do localj=1,nv/2
        sh = -v(localj)*dtsdx
        ii = int(sh)
        sh = sh - float(ii)
        a = -0.25_dp*(1.0_dp-sh)
        do locali=0,nx-ii
            y(locali) = fe_out(locali+ii,localj)
        end do
        locali = -1  
        if (ii.ne.0) y(locali) = fe_out(locali+ii,localj) 
        if (ii.eq.0) y(locali) = fe_out(0,localj) 
        do locali=nx+1-ii,nx+2
            y(locali) = 0.0_dp
        end do
        do i=0,nx
            dm(locali) = sh*(y(locali) + a*(y(locali+1)-y(locali-1)))
        end do
        dm(nx+1) = 0.0_dp
        do locali=0,nx
            fe_out(locali,localj) =  y(locali) + dm(locali+1) - dm(locali) 
        end do
    end do
    fe_out(-1,1:nv) =0.0_dp 
    write(*,*) abs(fe_out(nx/8,nv/2)/fe_local(nx/8,nv/2))
end

This worked. The problem is that this scheme is not powerful enough for what I intent to do though... It is not clear to me exactly why. Am I missing on something ? I have been working on this for two entire days now... I would greatly appreciate any insights.