Am I correct to assume the mmap'd memory using MAP_HUGETLB|MAP_ANONYMOUS
is actually 100% physically coherent? at least on the huge page size, 2MB or 1GB.
Otherwise I don't know how it could work/be performant since the TLB would need more entries...
Yes, they are. Indeed, as you point out, in case they weren't, multiple page table entries would be needed for a single huge page, which would defeat the entire purpose of having a huge page.
Here's an excerpt from Documentation/admin-guide/mm/hugetlbpage.rst
:
The default for the allowed nodes--when the task has default memory policy--is all on-line nodes with memory. Allowed nodes with insufficient available, contiguous memory for a huge page will be silently skipped when allocating persistent huge pages. See the discussion below
<mem_policy_and_hp_alloc>
of the interaction of task memory policy, cpusets and per node attributes with the allocation and freeing of persistent huge pages.The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the allocation attempt. If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any.
See also: How do I allocate a DMA buffer backed by 1GB HugePages in a linux kernel module?