It seems like I should be able to implement a vector-type object that I can insert into and read from simultaneously like so:
In this way if I read from the vector while it's reallocating I just read from the old location, which is still valid. (Of course deletion won't be threadsafe, but that's fine; you'll just have to take that into account if you want to delete things, which is at any rate no worse than what you have with std::vector
right now.)
In turn, I should be able to adapt this into a hashtable without too much trouble -- just use one of these vectors for the buckets, and back each bucket with one of these vectors. (I realize you should back the buckets with some kind of self-balancing binary tree to get optimal asymptotic complexity, but vectors are fine for my application and I don't want to take things too far off course here.)
Two questions:
std::atomic
in a few places, of course, but is there some way to use things like std::vector
or std::unordered_map
here?)Alternatively, is there a book or something I can read on the subject?
The problem with writing thread-safe code is that it's very difficult to cover all possible scenarios that can occur as code is running concurrently. Most problematic of all is that home-grown thread-safe data structures may seem to be working as expected but then often fail at random in production.
Even more complex than lock-based algorithms are lock-free or wait-free algorithms. Lock-free algorithms guarantee that other threads can make progress even if one thread is suspended. Wait-free algorithms (which are lock-free) guarantee that all threads can make progress.
In addition to the actual algorithm, you always have to think about the platform you're implementing the algorithm for. Multi-threaded code depends on the compiler and the memory model of the processor, particularly if locks are not used. std::atomic
provides platform-independent access to atomic primitives required for lock-free/wait-free algorithms. That doesn't make writing correct custom thread-safe data structures much easier, though.
The short answer is: don't do it.
The long answer:
The most important point is the exact scenario you need the data structure for. Based on that, you can derive the requirements and assess if it's feasible to implement it yourself. For the sake of understanding the underlying mechanisms of such implementations, experimenting makes sense. For the sake of production code, this usually comes back to haunt you and thus rarely wins.
As you cannot rely on undefined behavior of standard containers (behavior that cannot be implied by the interface contract), it's difficult to use them as a basis for your implementation. The documentation usually defines the expected behavior from a single-threaded POV. For multi-threading, though, you need to know internals to be able to rely on them - unless, of course, the data structure was implemented with concurrency in mind.
Getting back to your scenario: let's say what you need is a hash table with a fixed number of buckets that can be read from without blocking. Inserts may be serialized, deletions are not required. This is very common for caches.
As a building block, all you need is a single lock and a fixed number of linked lists that represent the hash table buckets and handle collisions.
The lookup algorithm would be as follows (pseudo-code):
node* lookup(key) {
// concurrency issue (see below)
node = buckets[hash(key)].find(key);
if (node) {
return node;
}
lock();
node = buckets[hash(key)].find(key);
if (node) {
return node;
}
node = new node(key);
// concurrency issue (see below)
buckets[hash(key)].add(node);
unlock();
return node;
}
The hash table can be read without blocking, inserts are serialized. This only works if items are never removed from the buckets. Otherwise, you might access deallocated data.
There's a second caveat which is not immediately apparent and illustrates the complexity of writing multi-threaded code. This only works as expected if the newly created node is fully allocated and visible to other threads before its pointer is being inserted into the bucket. If that order is not maintained, readers might trigger a segmentation fault because they access a partially initialized node. The order is influenced by the compiler and CPU, both of which are free to reorder instructions as long as the behavior from the POV of single-threaded code does not change.
In this specific case, the order is highly relevant. Consequently, we need to inform both, the compiler and CPU, that the new
must happen before the add
. Also, the reader (find
) needs to read the pointer before any other data. This is achieved by influencing the memory order of both operations. In C++11, representing the node pointer as a std::atomic<node*>
and using load
and store
to read/write the pointer solves that issue because the default memory order is std::memory_order_seq_cst
, which provides a sequential consistency guarantee. There's a more nuanced approach that might generate more efficient code (using std::memory_order_acquire
for load
and std::memory_order_release
for store
). You can also influence the order by placing so-called memory barriers/fences appropriately (those are implicitly triggered by the memory order arguments mentioned).
The reason why purely lock-based algorithms usually do not have to deal with memory ordering is that the locking primitives already trigger memory barriers/fences implicitly with every lock
and unlock
.
Long story short: if you there's no need to create your own thread-safe data structures, don't do it and instead rely on existing implementations that have been reviewed and tested thoroughly.