cudagpugpgpu

Duplicate faults on Unified Virtual Memory


According to the paper, Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis,

It talks about the trade-off about when to replay the fault on UVM environment. Specifically, it says:

Furthermore, issuing replays with outstanding faults causes unsatisfied requests to fault again, generating duplicate faults in the fault buffer and more processing for the UVM driver subject to policy. [Section 3.E. Cost of Demand Paging]

What does that mean "unsatisfied request"? And why does it generate the same fault (duplicate fault)?


Solution

  • A page fault ("fault") occurs when code execution triggers a read or write to a location in memory whose corresponding "page" (the block of memory that contains that location) is not physically resident in the primary backing store (e.g. system DRAM on a CPU, or GPU DRAM on a GPU for device code).

    In CUDA Unified Memory, page faults trigger the migration (i.e. movement, copying) of pages from the physical resource that currently holds them to the physical resource "local" to the processor where the fault occurred (i.e. where the code in question was executing when the fault occurred.) (Other steps are involved here as well, such as modification of paging tables.)

    If the page fault corresponds to (i.e. includes the same page as) a previous fault, then it is a "duplicate". This can happen in ordinary circumstances when two separate paths of execution (say, in a GPU, two different threadblocks on two different SMs) attempt to touch the same page, with some temporal locality.

    When a page fault occurs, the code that triggered it must be halted by a processor mechanism. When the operating system chooses to do so, it may "restart" or "replay" that code, to allow execution to begin again, where it "left off" or was interrupted by the fault.

    If a replay is initiated (for whatever reasons -- a focus of the paper I believe), but all necessary data to complete that code or instruction are not yet present in the primary backing store of the processor in question, another page fault will occur. This is due to a request not yet satisfied (for a page to be migrated), ie. an "unsatisfied request".

    Therefore if a page fault occurs, and the request is unsatisfied before the code that generated the page fault is restarted or "replayed", then its logical that the code will fault again, and this fault will be a duplicate of the previous fault for that line of code.

    Note, I'm not intending to (fully) answer the question "why would you do that" or "under what circumstances would you restart code when its page faults/requests are unsatisfied", because you didn't ask that question and furthermore I believe that is a central topic of the paper - not a trivial term to define. Having said that, this excerpt from the paper begins to unbundle this question:

    Deciding when to notify fault replay has some considerations due to trade-offs between latency and replay overhead. It is not necessary for all outstanding faults to be serviced prior to issuing a replay notification, but a notification allows faulting SMs to resume sooner at the cost of additional instances of replay overhead. Furthermore, issuing replays with outstanding faults causes unsatisfied requests to fault again, generating duplicate faults in the fault buffer and more processing for the UVM driver subject to policy. On the other hand, waiting too long to issue replays causes warps to be stalled for a long period of time, which has a negative cascading effect latency hiding

    The general idea is that page faults will accumulate with corresponding stalled warps. At some point, without exact definition/tracking of when all faults for a particular code/warp/instruction are "satisfied" such that a replay would not trigger a fault again, the machine must make a decision about whether to attempt the replay. If the replay is not successful, the code will fault again, generating another migration request for the same page(potentially pages, considered warp-wide), i.e. a "duplicate".