c++multithreading synchronization readwritelock

How to make a multiple-read/single-write lock from more basic synchronization primitives?

We have found that we have several spots in our code where concurrent reads of data protected by a mutex are rather common, while writes are rare. Our measurements seem to say that using a simple mutex seriously hinders the performance of the code reading that data. So what we would need is a multiple-read/single-write mutex. I know that this can be built atop of simpler primitives, but before I try myself at this, I'd rather ask for existing knowledge:

What is an approved way to build a multiple-read/single-write lock out of simpler synchronization primitives?

I do have an idea how to make it, but I'd rather have answers unbiased by what I (probably wrongly) came up with. (Note: What I expect is an explanation how to do it, probably in pseudo code, not a full-fledged implementation. I can certainly write the code myself.)

Caveats:

This needs to have reasonable performance. (What I have in mind would require two lock/unlock operations per access. Now that might not be good enough, but needing many of them instead seems unreasonable.)
Commonly, reads are more numerous, but writes are more important and performance-sensitive than reads. Readers must not starve writers.
We are stuck on a rather old embedded platform (proprietary variant of VxWorks 5.5), with a rather old compiler (GCC 4.1.2), and boost 1.52 – except for most of boost's parts relying on POSIX, because POSIX isn't fully implemented on that platform. The locking primitives available basically are several kind of semaphores (binary, counting etc.), on top of which we have already created mutexes, conditions variables, and monitors.
This is IA32, single-core.

Solution

It seems like you only have mutex and condition_variable as synchronization primitives. therefore, I write a reader-writer lock here, which starves readers. it uses one mutex, two conditional_variable and three integer.

readers - readers in the cv readerQ plus the reading reader
writers - writers in cv writerQ plus the writing writer
active_writers - the writer currently writing. can only be 1 or 0.

It starve readers in this way. If there are several writers want to write, readers will never get the chance to read until all writers finish writing. This is because later readers need to check writers variable. At the same time, the active_writers variable will guarantee that only one writer could write at a time.

class RWLock {
public:
    RWLock()
    : shared()
    , readerQ(), writerQ()
    , active_readers(0), waiting_writers(0), active_writers(0)
    {}

    void ReadLock() {
        std::unique_lock<std::mutex> lk(shared);
        while( waiting_writers != 0 )
            readerQ.wait(lk);
        ++active_readers;
        lk.unlock();
    }

    void ReadUnlock() {
        std::unique_lock<std::mutex> lk(shared);
        --active_readers;
        lk.unlock();
        writerQ.notify_one();
    }

    void WriteLock() {
        std::unique_lock<std::mutex> lk(shared);
        ++waiting_writers;
        while( active_readers != 0 || active_writers != 0 )
            writerQ.wait(lk);
        ++active_writers;
        lk.unlock();
    }

    void WriteUnlock() {
        std::unique_lock<std::mutex> lk(shared);
        --waiting_writers;
        --active_writers;
        if(waiting_writers > 0)
            writerQ.notify_one();
        else
            readerQ.notify_all();
        lk.unlock();
    }

private:
    std::mutex              shared;
    std::condition_variable readerQ;
    std::condition_variable writerQ;
    int                     active_readers;
    int                     waiting_writers;
    int                     active_writers;
};