cpthreadsshared-memorymultiprocessfutex

Shared pthread_cond_broadcast stuck in futex_wait


I have one "server" process a and potentially multiple "client" processes b. The server creates a shared memory file (shm_open) containing a pthread_mutex_t and a pthread_cond_t that it uses for broadcasting to the clients that something has happned (see the minimal example below).

At first this works fine as expected, supporting an arbitrary number of clients, but after the first client gets killed (e.g. using CTRL+C) while waiting for the broadcast, the server sometimes gets stuck in pthread_cond_broadcast, or to be more percise inside futex_wait according to gdb.

Why? And how should this be done correctly?

I've tried with and without holding the mutex and with and without a mutex after finding some discussions about this. Everything has the same behaviour.

The code to reproduce:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <pthread.h>

struct {
    pthread_cond_t cond;
    pthread_mutex_t mutex;
} *shm;

void a() {
    // create shm and broadcast every second
    int shm_fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(*shm));
    shm = mmap(0, sizeof(*shm), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);
    close(shm_fd);

    pthread_mutexattr_t mutexattr;
    pthread_mutexattr_init(&mutexattr);
    pthread_mutexattr_setpshared(&mutexattr, PTHREAD_PROCESS_SHARED);
    pthread_mutex_init(&shm->mutex, &mutexattr);
    pthread_mutex_consistent(&shm->mutex);

    pthread_condattr_t condattr;
    pthread_condattr_init(&condattr);
    pthread_condattr_setpshared(&condattr, PTHREAD_PROCESS_SHARED);
    pthread_cond_init(&shm->cond, &condattr);

    for (int i = 0; 1; ++i) {
        pthread_mutex_lock(&shm->mutex);
        pthread_cond_broadcast(&shm->cond);
        pthread_mutex_unlock(&shm->mutex);
        sleep(1);
        printf("broadcast %d\n", i);
    }
}

void b() {
    // open shm and listen for events
    int shm_fd = shm_open("/my_shm", O_RDWR, 0666);
    shm = mmap(0, sizeof(*shm), PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);
    close(shm_fd);
    for (int i = 0; 1; ++i) {
        pthread_mutex_lock(&shm->mutex);
        pthread_cond_wait(&shm->cond, &shm->mutex);
        pthread_mutex_unlock(&shm->mutex);
        printf("receive %d\n", i);
    }
}

int main(int argc, char** argv) {
    if (argc != 2)
        return -1;
    switch (argv[1][0]) {
    case 'a':
        a();
        break;
    case 'b':
        b();
        break;
    default:
        return -1;
    }
    return 0;
}

Compile with gcc ab.c -o ab -lpthread -lrt, then run

./ab a &
./ab b
CTRL+C
./ab b

Sometime between the CTRL+C and ./ab b the server will stop outputting broadcast.


Solution

  • [...] after the first client gets killed (e.g. using CTRL+C) while waiting for the broadcast, the server sometimes gets stuck in pthread_cond_broadcast [...]

    Why?

    Because killing the process may leave the CV and / or mutex in an inconsistent state. The same general thing can happen when one thread of a multithreaded process is forcibly killed, or when a multithreaded process forks. Indeed, given that the b processes spend most of their time waiting on the CV, it is pretty likely that they leave that inconsistent when they get terminated by a signal.

    And how should this be done correctly?

    To prevent the CV becoming inconsistent under such circumstances, you should ensure -- to the extent that it is possible -- that the b processes do not terminate while waiting on the CV. To protect them against that happening as a result of receiving a signal, set up a handler for the signal that raises a flag (of type sig_atomic_t). The process would then checks that flag after returning from the wait to determine whether it needs to terminate. Conceivably, you could also broadcast to the CV to ensure that the process proceeds with the termination as soon as possible.

    Do note, however, that some signals cannot be caught or blocked, and the above approach cannot do anything about those. Some other signals can be caught, but obligate the handler to terminate the program to avoid undefined behavior, and the above approach doesn't help with those, either.

    Additionally, there are other issues with your code, including