upc

How to solve UPC Runtime error: out of shared memory


I am trying to run a Berkeley UPC code on a computer with 64 cores and 256 GB RAM. However the code fails to run because it cannot find enough memory. The following should work because 51 x 5 = 255 GB < 256 GB

upcrun -n 51 -shared-heap=5GB xcorupc_sac inputpgas_sac{$rc1}.txt
..
UPCR: UPC thread  3 of 51 on range (pshm node 0 of 1, process  3 of 51, pid=191914)
UPCR: UPC thread 16 of 51 on range (pshm node 0 of 1, process 16 of 51, pid=191927)
UPC Runtime warning: Requested shared memory (5120 MB) > available (2515 MB) on node 0 (range): using 2515 MB per thread instead

UPC Runtime error: out of shared memory
  Local shared memory in use:  1594 MB per-thread,  81340 MB total
  Global shared memory in use:    0 MB per-thread,     1 MB total
  Total shared memory limit:   2515 MB per-thread,  128281 MB total
upc_alloc unable to service request from thread 0 for 1672245248 more bytes

NOTICE: Before reporting bugs, run with GASNET_BACKTRACE=1 in the environment to generate a backtrace. 
NOTICE: We recommend linking the debug version of GASNet to assist you in resolving this application issue.

I don't understand why the Total shared memory limit is 128 GB which is half of the total physical memory present. I cannot over-ride it even with a the shared-heap flag where I am clearly asking for 5 GB per thread. Any suggestions ?

cat /proc/meminfo 
MemTotal:       263378836 kB

The UPC build was compiled using flags --with-sptr-packed-bits=20,9,35 that allows up to 2^35 = 32 GB of shared memory per thread.

EDIT1: Following is the output of the command upcc --version

[avinash@range jointinvsurf5_cajoint_compile]$ upcc --version
This is upcc (the Berkeley Unified Parallel C compiler), v. 2019.4.4
  (getting remote translator settings...)
----------------------+---------------------------------------------------------
 UPC Runtime          | v. 2019.4.4, built on Feb 11 2020 at 23:31:40
----------------------+---------------------------------------------------------
 UPC-to-C translator  | v. 2.28.0, built on Jul 19 2018 at 20:29:47
                      | host aphid linux-x86_64/64
                      | gcc v4.2.4 (Ubuntu 4.2.4-1ubuntu4)
----------------------+---------------------------------------------------------
 Translator location  | http://upc-translator.lbl.gov/upcc-2019.4.0.cgi
----------------------+---------------------------------------------------------
 networks supported   | smp udp mpi ibv
----------------------+---------------------------------------------------------
 default network      | ibv
----------------------+---------------------------------------------------------
 pthreads support     | available (if used, default is 2 pthreads per process)
----------------------+---------------------------------------------------------
 Configured with      | '--with-translator=http://upc-translator.lbl.gov/upcc-2
                      | 019.4.0.cgi' '--with-sptr-packed-bits=20,9,35'
                      | '--prefix=/usr/local/berkeley_upc/opt'
                      | '--with-multiconf-magic=opt'
----------------------+---------------------------------------------------------
 Configure features   | trans_bupc,pragma_upc_code,driver_upcc,runtime_upcr,
                      | gasnet,upc_collective,upc_io,upc_memcpy_async,
                      | upc_memcpy_vis,upc_ptradd,upc_thread_distance,upc_tick,
                      | upc_sem,upc_dump_shared,upc_trace_printf,
                      | upc_trace_mask,upc_local_to_shared,upc_all_free,
                      | upc_atomics,pupc,upc_types,upc_castable,upc_nb,nodebug,
                      | notrace,nostats,nodebugmalloc,nogasp,nothrille,
                      | segment_fast,os_linux,cpu_x86_64,cpu_64,cc_gnu,
                      | packedsptr,upc_io_64
----------------------+---------------------------------------------------------
 Configure id         | range Tue Feb 11 23:18:39 PST 2020 gnome-initial-setup
----------------------+---------------------------------------------------------
 Binary interface     | 64-bit x86_64-unknown-linux-gnu
----------------------+---------------------------------------------------------
 Runtime interface #  | Runtime supports 3.0 -> 3.13: Translator uses 3.6
----------------------+---------------------------------------------------------
                      |  --- BACKEND SETTINGS (for ibv network) ---
----------------------+---------------------------------------------------------
 C compiler           | /usr/bin/gcc
                      |   GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
                      |   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
                      |   (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
 C compiler flags     | -O3 --param max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Wno-unused
                      | -Wunused-result -Wno-unused-parameter -Wno-address
                      | -std=gnu99
----------------------+---------------------------------------------------------
 linker               | /data/seismo82/avinash/Programs/openmpiinstall/bin/mpic
                      | c
                      |   GNU/4.8.5/4.8.5 20150623 (Red Hat 4.8.5-39)
                      |   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) Copyright
                      |   (C) 2015 Free Software Foundation, Inc.
----------------------+---------------------------------------------------------
 linker flags         | -D_GNU_SOURCE=1 -O3 --param
                      | max-inline-insns-single=35000 --param
                      | inline-unit-growth=10000 --param
                      | large-function-growth=200000 -Wno-unused
                      | -Wunused-result -Wno-unused-parameter -Wno-address
                      | -std=gnu99 -L/data/seismo82/avinash/Programs/myupc/opt
                      | -L/data/seismo82/avinash/Programs/myupc/opt/umalloc
                      | -lupcr-ibv-seq -lumalloc
                      | -L/data/seismo82/avinash/Programs/myupc/opt/gasnet/ibv-
                      | conduit -lgasnet-ibv-seq -libverbs -lpthread -lrt
                      | -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -lgcc -lm
----------------------+---------------------------------------------------------

EDIT2: Following is the output of df -h /dev/shm command

[avinash@range jointinvsurf5_cajoint_compile]$ df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           126G   21M  126G   1% /dev/shm

Solution

  • By default, Berkeley UPC uses kernel shared memory services to cross-map the UPC shared segments between co-located processes. For smp-conduit, this is the only mode of operation.

    Assuming this is a Linux system with configure defaults, the most likely explanation is exhaustion of the kernel-provided POSIX shared memory space. You can confirm this by looking at the virtual file system where that resides. Here's an example from a system configured for up to 20G of shared memory:

    $df -h /dev/shm /var/shm /run/shm
    df: '/var/shm': No such file or directory
    df: '/run/shm': No such file or directory
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs            20G  504K   20G   1% /dev/shm
    

    This value limits the total per-node shared memory segment space. This limit can usually be raised by an administrator adjusting kernel settings, although the details vary with distribution.

    For more info, see the section 'System Settings for POSIX Shared Memory' in https://gasnet.lbl.gov/dist-ex/README

    Finally, note that even once the above issue is addressed, asking for 255 GB of shared memory heap on a system with 256 GB of physical DRAM (99.6%) may be inadvisable. This leaves very little space for the non-shared portions of application memory (stack, static data, malloc heap) and for memory overheads of the kernel and daemon processes. Depending on your kernel settings this may trigger an out-of-memory panic to start killing processes. We generally recommend a safe rule-of-thumb limit of 85% of physical memory (assuming the system is otherwise idle), and "proceed with caution" beyond that.