I recently started to work with Infiniband cards, two Mellanox Technologies MT27700 Family [ConnectX-4] to be specific. Eventually, I want to extend an existing framework with interfaces based on the VPI Verbs API/RDMA CM API.
About the research I already did on RDMA programming: I started by reading Mellanox' RDMA Aware Networks Programming User Manual. Secondly, I read a quite comprehensive blog written on the capabilities of the VPI Verbs/RDMA Verbs. Finally, I read the three papers on RDMA programming, published by Tarick Bedeir: [1], [2], [3].
To get an idea what is best for my needs, I created a testbench to measure, among others, latency, CPU usage, and throughput. I tested different operations (see table 1 below), different send flags (e.g. IBV_SEND_INLINE
), and different ways of retrieving Work Completions (busy polling vs. waiting for an event in an completion channel). My testbench is partly inspired by the findings from this performance study on RDMA programming.
OPCODE | IBV_QPT_UD | IBV_QPT_UC | IBV_QPT_RC
----------------------------+------------+------------+-----------
IBV_WR_SEND | X | X | X
IBV_WR_SEND_WITH_IMM | X | X | X
IBV_WR_RDMA_WRITE | | X | X
IBV_WR_RDMA_WRITE_WITH_IMM | | X | X
IBV_WR_RDMA_READ | | | X
IBV_WR_ATOMIC_CMP_AND_SWP | | | X
IBV_WR_ATOMIC_FETCH_AND_ADD | | | X
Currently, I am still figuring out all possibilities.
One thing that I have noticed, is that I have to call ibv_post_send
every time I want to write into the remote memory or read from the remote memory with IBV_WR_RDMA_WRITE
or IBV_WR_RDMA_READ
, respectively. So, my question is if it is possible to map the remote memory addresses into the virtual address space of a host.
Of course, all initialization of VPI components, registration of memory with ibv_reg_mr
, and exchange of remote keys and addresses would still have to be done. Does Infiniband offer anything to make this possible?
Thanks!
There's no native way to provide the functionality that you are looking for with RDMA. RDMA is designed as a networking protocol which relies on user applications to submit work requests of the forms defined above. However, although this isn't part of the original protocol, I believe it's not entirely impossible to implement a layer that will provide remote memory access through local memory space - but I'm not aware of any such system.
The closest thing that comes to mind is a memory disaggregation solution that basically lets you use remote memory when the local memory is fully utilized. Here's an example for such system: https://github.com/SymbioticLab/infiniswap