Consider this snippet code of implementing a lock-free multi-producer single-consumer queue in Rust
struct Node<T> {
next: AtomicPtr<Node<T>>,
value: Option<T>,
}
impl<T> Node<T> {
unsafe fn new(v: Option<T>) -> *mut Node<T> {
Box::into_raw(box Node { next: AtomicPtr::new(ptr::null_mut()), value: v })
}
}
pub struct Queue<T> {
head: AtomicPtr<Node<T>>,
tail: UnsafeCell<*mut Node<T>>,
}
impl<T> Queue<T> {
pub fn push(&self, t: T) {
unsafe {
let n = Node::new(Some(t)); // #1
let prev = self.head.swap(n, Ordering::AcqRel); // #2
(*prev).next.store(n, Ordering::Release); // #3
}
}
pub fn pop(&self) -> PopResult<T> {
unsafe {
let tail = *self.tail.get();
let next = (*tail).next.load(Ordering::Acquire); // #4
if !next.is_null() {
*self.tail.get() = next;
assert!((*tail).value.is_none());
assert!((*next).value.is_some());
let ret = (*next).value.take().unwrap();
let _: Box<Node<T>> = Box::from_raw(tail);
return Data(ret);
}
if self.head.load(Ordering::Acquire) == tail { Empty } else { Inconsistent }
}
}
}
An equivalent C++ implementation:
template<class T>
struct Node{
std::atomic<T*> next;
T value;
Node(T v):value(v),next(){}
};
template<class T>
struct Queue {
std::atomic<Node<T>*> head;
Node<T>* tail;
Queue(){
auto h = new Node<T>{};
head.store(h);
tail = h;
}
void push(T t){
auto node = new Node<T>{};
auto pre = this->head.exchange(node,std::memory_order_acq_rel);
pre->next.store(node,std::memory_order_release);
}
T pop(){
auto tail = this->tail;
auto next = tail.next.load(std::memory_order_acquire);
if(next){
this->tail = next;
auto ret = next->value;
delete next;
return ret;
}
if this->head == tail{
throw "empty";
}
}
}
Since the Rust atomic object model is the same as that of C++, I will reference C++ standards about atomic operations.
First, look at #2
, since [atomics.order] p10 says:
Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
This read-modify-write operation guarantees that the invocation of push
will change self.head
in order and only a single thread will modify the next
field of that object returned by swap
.
#3
is a release store operation while #4
is an acquire load operation, hence, once #4
read the pointer written at #3
, they will synchronize, such that the modification happened at #1
will happen-before those operations that read/write the object after #4
.
If I understand correctly, the happen-before between #1
and the part after #4
is established by the synchronization of #3
and #4
. The read-modify-write operation at #2
merely guarantees that every thread that executes push
has its individual "pre", and they happen in the modification order.
So, I wonder can the Ordering::AcqRel
at #2
be replaced with Relaxed
?
Your analysis focuses on synchronization between producer and consumer. You're correct that the AcqRel at #2 isn't needed for that. But it is needed for synchronization between multiple producers.
Suppose one producer A creates a node and stores its pointer into head
. Another producer B may then load that same pointer from head
and access that node at #3. These need to be synchronized, so you need an acquire release pair, and head
is the only atomic variable to serve that purpose. If you change #2 to Relaxed, then those two accesses would be a data race.