I'm doing parallel programming in a NUMA computer (I do not have the computer yet, it's scheduled to arrive soon™).
I have a pool of worker threads on each NUMA node (with processor affinity set) and a balancer that spreads work evenly among the pools/nodes. This is to ensure all memory allocations are on the local memories. This is all fine and dandy.
During start up the pool worker threads are created from the main thread and they have to do some inital setup before they can set their own affinities (3rd party library requirement, nothing I can do about it).
I'm worried that there will be a hidden performance penalty as the stack frames of the worker threads get allocated on the wrong nodes, causing foreign memory accesses.
Is this a real issue? Somehow I believe it has been solved already...
Anyway what I'm looking for is a way to make sure that the stack of each thread gets allocated on the correct NUMA node.
My dedicated google boy came up with this: Allocating a Thread's Stack on a specific NUMA memory which is kind of what I want to do, but it's pthreads and I need a windows solution.
There is a remarkable lack of information on this on MSDN, but given what I've heard Mark Russinovich describe when talking about windows memory internals I wouldn't worry about it unless I started to see a noticeable slowdown.
In the scheme of things even cross node memory accesses are still faster than say... swapping to disk. More importantly because the physical mapping of memory to active pages has no relationship with the memory space on windows (a purely virtual memory space), the kernel will likely remap the stack pages for each thread based on affinity.
I don't really think this is going to affect you, if it would there would already be an exposed solution because the SQL Server team would have run into it a long time ago.