pythonnest-simulator

Why is `total_num_virtual_procs` not equal to the amount of MPI processes?


In the NEST simulator there is the concept of virtual processes. Reading the information on virtual processes I would expect every MPI process to contain at least 1 virtual process, otherwise that MPI process isn't doing anything?

However, when I start 4 MPI processes the kernel status attribute total_num_virtual_procs is 1:

mpiexec -n 4 python -c "import nest; import mpi4py.MPI; print(nest.GetKernelStatus()['total_num_virtual_procs'], mpi4py.MPI.COMM_WORLD.Get_size());"

This prints the NEST import text and 1 4 four times. Does this mean 3 processes aren't going to be used for the simulation until I do nest.SetKernelStatus({'total_num_virtual_procs': 4})?


Solution

  • EDIT: TL;DR: The return value of nest.GetKernelStatus('total_num_virtual_procs') was buggy in former NEST versions. Recent versions show the correct number, which by default is one thread per process, so the number of MPI processes.

    The number of virtual processes is a free parameter of NEST because it uses a hybrid parallelization scheme with MPI + OpenMP. You may have multiple threads per process, each being its own virtual process, e.g. two processes and four VPs leads to two threads per process:

    Process  Thread  VP
    -------  ------  --
    0        0       0
    1        0       1
    0        1       2
    1        1       3
    

    Setting total_num_virtual_procs to eight, would produce four threads per process, and so on. Your above example works even without mpi4py like this:

    mpiexec -n 2 python -c "\
       import nest; \
       nest.SetKernelStatus({'total_num_virtual_procs': 4}); \
       print('>>> this is process %d of %d with %d threads <<<' \
             % ( nest.Rank(),
                 nest.NumProcesses(), \
                 nest.GetKernelStatus()['total_num_virtual_procs']/nest.NumProcesses()) \
       ); \
       nest.Simulate(10);"
    

    It has following lines among its output:

    …
    
    >>> this is process 1 of 2 with 2 threads <<<
    >>> this is process 0 of 2 with 2 threads <<<
    …
    
    Sep 09 15:49:39 SimulationManager::start_updating_ [Info]: 
        Number of local nodes: 0
        Simulation time (ms): 10
        Number of OpenMP threads: 2
        Number of MPI processes: 2
    
    Sep 09 15:49:39 SimulationManager::start_updating_ [Info]: 
        Number of local nodes: 0
        Simulation time (ms): 10
        Number of OpenMP threads: 2
        Number of MPI processes: 2
    

    You can see that the total_num_virtual_procs is split over all processes, such that Number of OpenMP threads times Number of MPI processes equals total_num_virtual_procs. Further you note, that you don't see the thread parallelization here on the Python level, since the processes only enter parallel context in Create(), Connect() and Simulate() calls in the C++ scope below.

    A generally good starting point when experimenting with different geometries of your jobs is one MPI process per NUMA domain (e.g. one process per physical cpu socket) and one thread per physical core (hyper-threading may cause a fight for the cache lines which may even degrade performance).