cconcurrencylinked-listreadwritelock

Safe removal of a node in a concurrent linked list


I'm currently reading the book APUE. When I read the chapter about pthread reader/writer-lock, I have a question about its implementation of concurrent queue using reader/writer-lock.

struct queue {
    struct job *q_head;
    struct job *q_tail;
    pthread_rwlock_t q_lock;
};

/*
* Remove the given job from a queue.
*/
void
job_remove(struct queue *qp, struct job *jp)
{
    pthread_rwlock_wrlock(&qp->q_lock);
    if (jp == qp->q_head) {
        qp->q_head = jp->j_next;
        if (qp->q_tail == jp)
            qp->q_tail = NULL;
        else
            jp->j_next->j_prev = jp->j_prev;
    } else if (jp == qp->q_tail) {
        qp->q_tail = jp->j_prev;
        jp->j_prev->j_next = jp->j_next;
    } else {
        jp->j_prev->j_next = jp->j_next;
        jp->j_next->j_prev = jp->j_prev;
    }
    pthread_rwlock_unlock(&qp->q_lock);
}

My question is that how this implementation can ensure that a struct job is removed from the linked list only once. From my understanding, two threads can be scheduled so that they are just before the line pthread_rwlock_wrlock. Then the struct job *jp might be freed twice. If struct job * is a dynamically allocated data structure, this might lead to a double-free bug. Any advice?


Solution

  • Your code has a race condition in it. Other changes can potentially happen to the queue between two threads obtaining a write lock on the queue and then trying to remove the same node. Other nodes can be removed, other nodes can be added. So if thread A removes a node, changes happen, and then thread B tries to remove the same node again, your queue can be corrupted.

    The code needs information to let it know that the node has already been removed.

    See the added comments, and the code to fix the race condition:

    struct queue {
        struct job *q_head;
        struct job *q_tail;
        pthread_rwlock_t q_lock;
    };
    
    /*
    * Remove the given job from a queue.
    */
    void
    job_remove(struct queue *qp, struct job *jp)
    {
        // we assume here that jp is actually in the queue
        pthread_rwlock_wrlock(&qp->q_lock);
    
        // at this point, jp may no longer be in the queue,
        // and in fact, the queue may be completely different.
        // thus, any modification to the queue based on the
        // assumption that jp is still part of the queue
        // can lead to corruption of the queue
    
        // so check if jp has been removed - we'll later set
        // both j_next and j_prev to NULL after jp is
        // removed from the queue - and if they're both
        // NULL here that means another thread already
        // removed jp from the queue
        if ( !(jp->j_next) && !(jp->j_prev) ) {
            // empty statement - jp is already removed
            ;
        }
        else if (jp == qp->q_head) {
            qp->q_head = jp->j_next;
            if (qp->q_tail == jp)
                qp->q_tail = NULL;
            else
                jp->j_next->j_prev = jp->j_prev;
        } // and this brace was missing in your posted code...
        else if (jp == qp->q_tail) {
            qp->q_tail = jp->j_prev;
            jp->j_prev->j_next = jp->j_next;
        } else {
            jp->j_prev->j_next = jp->j_next;
            jp->j_next->j_prev = jp->j_prev;
        }
    
        // make sure data in jp no longer refers to
        // the queue - this will also tell any other
        // thread that accesses jp that it's already
        // been removed
        jp->j_next = NULL;
        jp->j_prev = NULL;
    
        pthread_rwlock_unlock(&qp->q_lock);
    }
    

    You also need to check how jp gets free()d or deleted. Multiple threads can't be allowed to do that.