what is difference between (ulimit -s unlimited) and (export KMP_STACKSIZE = xx)?

I ran my program like below, and used ( ulimit -s unlimited ). It works.

    REAL(DP), DIMENSION(1024,2,1541) :: L_X TanV
    REAL(DP), DIMENSION(4) :: Val_X, Val_Y
    REAL(DP), dimension(1029) :: E_x
    REAL(DP), dimension(1024) :: E_y
    REAL(DP), DIMENSION(1024,1024) :: E_Fx, E_Fy

    !$OMP SECTIONS PRIVATE(i, j, ii,jj, PSL_X, i_x, i_y, Val_X, Val_Y)
    !$OMP SECTION
    do j=1,LinkPlusBndry
      do i=1,Kmax(j)-1
        PSL_X(1)=modulo(L_X(i,1,j),H*N2); PSL_X(2)=L_X(i,2,j)
        i_x=floor(PSL_X(1)/H)+2; i_y=floor(PSL_X(2)/H)
        call Delta4((E_x(i_x:i_x+3)-PSL_X(1))/H,Val_X)
        call Delta4((E_y(i_y:i_y+3)-PSL_X(2))/H,Val_Y)
        do ii=1,4; do jj=1,4
           EE_Fx(i_y+ii-1,i_x+jj-1)=EE_Fx(i_y+ii-1,i_x+jj-1) &
                                   +tauH2*TanV(i,1,j)*Val_X(jj)*Val_Y(ii)
        end do; end do
      end do
    end do

    ...
    ...
    ...

    !$OMP SECTION
    do j=1,LinkPlusBndry
      do i=1,Kmax(j)-1
        PSL_X(1)=modulo(L_X(i,1,j),H*N2); PSL_X(2)=L_X(i,2,j)
        i_x=floor(PSL_X(1)/H)+2; i_y=floor(PSL_X(2)/H)
        call Delta4((E_x(i_x:i_x+3)-PSL_X(1))/H,Val_X)
        call Delta4((E_y(i_y:i_y+3)-PSL_X(2))/H,Val_Y)
        do ii=1,4; do jj=1,4
           EE_Fy(i_y+ii-1,i_x+jj-1)=EE_Fy(i_y+ii-1,i_x+jj-1) &
                                   +tauH2*TanV(i,2,j)*Val_X(jj)*Val_Y(ii)
        end do; end do
      end do
    end do
    !$OMP END SECTIONS

I don't like using !$OMP SECTION, it restricts the speed by using only 2 threads.

So I had changed my code like below.

!$OMP DO PRIVATE(j, i, PSL_X, i_x, i_y, ii, jj, Val_X, Val_Y) REDUCTION(+:EE_Fx, EE_Fy)
do j=1,LinkPlusBndry
  do i=1,Kmax(j)-1
    PSL_X(1)=modulo(L_X(i,1,j),H*N2); PSL_X(2)=L_X(i,2,j)
    i_x=floor(PSL_X(1)/H)+2; i_y=floor(PSL_X(2)/H)
    call Delta4((E_x(i_x:i_x+3)-PSL_X(1))/H,Val_X)
    call Delta4((E_y(i_y:i_y+3)-PSL_X(2))/H,Val_Y)
    do ii=1,4; do jj=1,4
       EE_Fx(i_y+ii-1,i_x+jj-1)=EE_Fx(i_y+ii-1,i_x+jj-1) &
                               +tauH2*TanV(i,1,j)*Val_X(jj)*Val_Y(ii)
       EE_Fy(i_y+ii-1,i_x+jj-1)=EE_Fy(i_y+ii-1,i_x+jj-1) &
                               +tauH2*TanV(i,2,j)*Val_X(jj)*Val_Y(ii)
    end do; end do

    PSL_X(1)=modulo(L_X(i+1,1,j),H*N2); PSL_X(2)=L_X(i+1,2,j)
    i_x=floor(PSL_X(1)/H)+2; i_y=floor(PSL_X(2)/H)
    call Delta4((E_x(i_x:i_x+3)-PSL_X(1))/H,Val_X)
    call Delta4((E_y(i_y:i_y+3)-PSL_X(2))/H,Val_Y)
    do ii=1,4; do jj=1,4
       EE_Fx(i_y+ii-1,i_x+jj-1)=EE_Fx(i_y+ii-1,i_x+jj-1) &
                               -tauH2*TanV(i,1,j)*Val_X(jj)*Val_Y(ii)
       EE_Fy(i_y+ii-1,i_x+jj-1)=EE_Fy(i_y+ii-1,i_x+jj-1) &
                               -tauH2*TanV(i,2,j)*Val_X(jj)*Val_Y(ii)
    end do; end do
  end do
end do
!$OMP END DO

when I launch this code, I get segmentation fault.

I thought it was related with the memory size. So, after searching I found this solution

 export KMP_STACKSIZE=value

Now I use 2 different commands

 ulimit -s unlimited

and

 export KMP_STACKSIZE=value

It works well, but I don't know difference between the two commands. What is the difference?

Solution

ulimit sets the OS limits for the program.

KMP_STACKSIZE tells the OpenMP implementation about how much stack to actually allocate for each of the stacks. So, depending on your OS defaults you might need both. BTW, you should rather use OMP_STACKSIZE instead, as KMP_STACKSIZE is the environment variable used by the Intel and clang compilers. OMP_STACKSIZE is the standard way of setting the stack size of the OpenMP threads.

Note, that this problem is usually more exposed, as Fortran tends to keep more data on the stack, esp. arrays. Some compilers can move such arrays to the heap automatically, see for instance -heap-arrays for the Intel compiler.