The "first touch" (a special term used to indicate virtual memory mapping in case of NUMA systems) write-operation causes the mapping of memory pages to the NUMA node associated with the thread which first writes to them. Having read this page, which is fairly difficult to interpret for novices, according to my understanding, this is the case when the default memory mapping policy is being used. Depending on the different policies which may be used, we could expect this to not be true anymore. Please feel free to correct my understanding of the matter.
My question is now this: If my scheduling policy (think #pragma omp for schedule(static, chunk_size)
) requires two threads from two distinct NUMA nodes to work on data from the same memory page, will the first touch write-operation load the memory page to both the nodes following the default mapping policy on Linux?
The documentation page you link says the default policy is "local allocation" which means that the first task which touches memory will cause the page to be allocated on the NUMA node local to the CPU where the task is running.
When the system is “up and running”, the system default policy will use “local allocation” described below.
[...]
"Local" allocation policy can be viewed as a Preferred policy that starts at the node containing the cpu where the allocation takes place.
Allocation happens once for a page. For example, assuming a mapped page shared by two threads T0 (on NUMA node 0) and T1 (on NUMA node 1), with the the default policy on first touch by T0 the page will be allocated on node 0. When T1 touches the page after it is allocated, it does not move to node 1 node or get re-allocated again there.
What happens is:
For reference, this article by Christoph Lameter also offers a thorough explanation of NUMA shenanigans under Linux