mpipython-rqpy-redis

Specifying which processes implement RQ worker using mpirun


I am using RQ to implement a job queue on a cluster, where the jobs are managed and run with Python. The RQ workers are launched via mpirun, followed by a Python program which adds jobs to the queue.

I have noticed that when I only have a single process, so the sole RQ worker is on the same process as the program, there is a significant delay. This may be because I have a large amount of data on the redis-server they share access to.

In a test case with a single job, using 2 processes speeds up overall. Therefore I think it would be best to have a single process for the program (the master), which just places jobs on the queue for the workers.

Currently I have

mpirun -np $NUM_WORKERS -machinefile $confile rq worker $WORKER_ID -u $REDIS_URL
python3 master_program.py

My main question is: how can I modify the mpirun command to launch RQ workers on the 2nd-Nth processes, ensuring the master_program has sole use of the first?

A secondary question: why is it so much slower when an RQ worker shares the process with the master program? While waiting on the result from the RQ worker, the master isn't doing anything else.


Solution

  • To answer your main question, you can use the MPI launcher to launch multiple executables as part of the same job. The exact syntax will depend on your job scheduler and MPI software.

    From the OpenMPI mpirun manpage https://www.open-mpi.org/doc/v4.0/man1/mpirun.1.php - use colons to separate the various executables:

    Multiple Instruction Multiple Data (MIMD) Model:

    mpirun [ global_options ] [ local_options1 ]
    <program1> [ <args1> ] : [ local_options2 ]
    <program2> [ <args2> ] : ... :
    [ local_optionsN ]
    <programN> [ <argsN> ]
    

    An example job script for Torque and OpenMPI might look like

    #!/bin/bash
    #PBS -l nodes=2:ppn=16,walltime=00:10:00
    
    module load openmpi
    
    OMPI_DEBUGGING_OPTS="--display-map --tag-output"
    
    # Reserve one task for the master_program
    NUM_WORKERS=$(($PBS_NP - 1))
    
    # Application specific setup
    REDIS_URL=<whatever>
    WORKER_ID=<whatever>
    
    # Change to submission dir
    cd ${PBS_O_WORKDIR}
    
    # Serial and parallel job
    mpirun ${OMPI_DEBUGGING_OPTS} \
        -np 1 \
        python3 master_program.py \
        : \
        -np ${NUM_WORKERS} \
        rq worker ${WORKER_ID} -u ${REDIS_URL}