I am seeing unexpected results from RDMA reads that make me doubt my understanding of the RDMA read and write semantics.
I'm trying to implement message passing in RMDA in a manner similar to L5, but am running into issues that look like memory tearing. But that shouldn't be happening.
I have a struct that is a bit more complicated than what L5 has:
struct Header {
std::atomic<uint8_t> mailbox = 1;
std::atomic<uint32_t> length;
char data[128];
};
On the writing side, I do RDMA reads until I see a value of 1 in mailbox. Then I do an RDMA write of length + data, set mailbox to 0, and send mailbox with a second RDMA write. On the reading side, I check for mailbox == 0, read the data, and set length to 0 and mailbox to 1.
When I do my RDMA reads I am occasionally seeing lengths <> 0 along with mailbox values of 0. Since RDMA operations are supposed to happen in order, I do not understand how this is happening.
One possible explanation is that if you do an RDMA read targeting a whole struct Header
then there's no guarantee what order the target RDMA adapter will read from memory to satisfy that read. Especially since your struct is not aligned to a cacheline size (I'm guessing you're on x86, where cachelines are 64 bytes), so mailbox
and length
could be in different cachelines.
I still don't really understand why it's surprising to see a length
!= 0 and mailbox
== 0 - isn't that the case where the where the reading side hasn't processed the mailbox at all? From what you wrote, the final state of the struct after the two RDMA writes from the writing side is exactly length
!= 0, mailbox
== 0.
In any case, since as I described above an RDMA adapter is completely free to read the memory being accessed by RDMA read in any order, it's possible for RDMA read to return any mix of old and new data, no matter what order the CPU updates the fields in. For example if you had:
struct Header
length
fieldlength
to 0 then mailbox
to 1mailbox
fieldthen the READ read would fetch length
!= 0, mailbox
== 1. This is because the RDMA read operation does not participate in any memory barriers or other ordering, even though you declared your struct members as atomic.