I get it that if a page has been accessed it'll have the Access bit set, and if has been written to, the Dirty bit will also be set. But it's unclear to me how these bits affect the TLB/TLB caching? Also OSDEV has the following paragraph
When changing the accessed or dirty bits from 1 to 0 while an entry is marked as present, it's recommended to invalidate the associated page. Otherwise, the processor may not set those bits upon subsequent read/writes due to TLB caching.
In what cases would you change the dirty and access bits from 1 to 0?
In what cases would you change the dirty and access bits from 1 to 0?
If the OS is running low on memory it'll want to free some pages (e.g. send the page's data to swap space so it can use the physical memory for something else). Often this will be based on "least recently used", but the CPU only has 1 "accessed" flag for each page, so to work around that the OS checks the accessed flags periodically and clears them back to zero so it knows (roughly) how long ago a page was accessed and not just if it was/wasn't accessed. It's a little like "if(page->accessFlag == 1) { page->accessFlag = 0; page->time_since_access = 0; } else { page->time_since_access++; }
".
For the dirty flag, consider a read/write memory mapped file, or a write-back disk cache. A program modifies data in memory (causing the dirty flag to be set); then when the disk drive has nothing better to do the OS might find pages with the dirty flag set, write them to disk, and clear the dirty flag/s back to zero (so the same page doesn't get written to disk again for no reason next time).
How does the Dirty and Access bits affect the TLB?
It's more the opposite - the TLB effects the flags (and the flags don't effect the TLB).
When a CPU sets the dirty or accessed flags it does an atomic update of memory to guard against race conditions (e.g. other CPUs modifying or checking the same page table entry at the same time), and atomic updates are somewhat expensive. To reduce/avoid these atomic writes a CPU can (and most likely will) cache the page's accessed and dirty flags in the TLB entry so that the atomic write can be skipped if the CPU wants to set the flag/s but the TLB entry says they're already set anyway. If the TLB entry is wrong (e.g. because the OS changed the accessed or dirty flags in memory but didn't invalidate the TLB entry) then the CPU can skip atomic writes that were needed by the OS. This can cause data corruption (e.g. OS assuming that a page's contents in memory don't need to be written to disk because the dirty flag wasn't set).