I'm trying to write a PCIe driver to DMA pages from the host memory to an FPGA. My host setup is Cavium ThunderX2 and my FPGAs are Xilinx Alveo U50.
A DMA from/to the host causes the ARM SMMU v3.4 to throw an event 0x10 Translation fault. I'm using dma_map_single(..) and dma_alloc_coherent(..) Linux APIs to map the virtual address of the page to a DMA-capable address.
Further inspecting the event records, Context Descriptor, and Stream Table Entries, I have the following information.
Type of Fault - F_TRANSLATION (Translation Fault)
S2 == 0 (Stage 1 Fault - Virtual Address -> Intermediate Physical Address stage)
Class of Fault = TT/TTD (Translation Table Descriptor Fetch)
PnU == Underprivileged Access
T0SZ == 5'b01000 (16); T1SZ == 5'b00000 (IGNORED because EPD1 == 1)
VAS == 49 bits (Virtual Address Size)
TG0 == 00 (4 kB page granule size)
EPD0 == 0 (Stage 1 page table walk enabled)
EPD1 == 1 (Stage 2 is bypassed)
TB0/1 == 0 (Top byte ignore disabled)
IPS == 44 bits (Input Address size)
SMMU Config = 3'b101 (Stage 1 translation enabled, Stage 2 bypassed)
Sample Virtual and DMA address of the page obtained -
Virtual Address - 0xFFFF--- (64-bit value)
DMA Address - 0x9F733CA000 (looks within the range defined by T0SZ and compliant with the IPS)
I'm unable to figure why I'm getting a Stage 1 translation fault when everything looks fine. Technically, I should be getting a Stage 2 fault since it is bypassed and the input address should translate through the TTB0.
P.S. I'm a newbie to ARM v8. Let me know if you need additional information in the comments.
Attached is a picture of the fault F_TRANSLATION.
I was able to fix the issue. There was a synchronization lapse between the IOMMU and the DMA mapping. There were no valid descriptors found for the mapped DMA addresses in the SMMU.
I used dma_alloc_coherent(SZ_2M) to get a buffer region and use IOMMU domain ops to map the IOVA to the SMMU.
int ret = iommu_domain->ops->map(domain, IOVA, size, phys_addr, PROT)
Now the SMMU is able to fetch and translate the IOVA.
For some reason, dma_map_single(..) doesn't work with my current implementation. I have to investigate why a streaming DMA API doesn't work.