I have a question while reading the paper, Supporting x86-64 address translation for 100s of GPU lanes.
The x86-64 ISA has extensions for 2 MB and 1 GB pages in addition to the default 4 KB page size.
Normally the page size is 4KB and thus, page offset bit is 12-bit. Then, if we use VIPT cache, how the tag can be identified if computer use variable page size (i.e. variable page offset bit)?
The Virtually Indexed Physically Tagged (VIPT) design is common for almost all L1 CPU caches; however, the L2 and L3 (LLC or Last-Level Cache) caches are typically both physically indexed and tagged.
The L1 cache is very small relative to the L2 and L3 caches, and it's sized so that L1 Size <= PageSize * L1 Associativity
. This would allow the L1 set index to be retrievable from the standard (i.e. 4KiB) base page's offset bits, prior to the page translation.
For example, on my system the L1 is 32KiB per core, has 8 ways (associativity), 64 sets, and each cache line is the standard size of 64 Bytes. If we do the math, we need log2(64) = 6
bits for the cache line offset, and another log2(64) = 6
bits to index the L1 cache's set. Considering the first 6 bits of the page is to address the byte within the cache line, we have another 6 bits left to dedicate to indexing the L1 cache's set.
This is perfectly sufficient to index my system's L1 cache set, considering the 4KiB base page size, log2(4096) = 12
bits determine the page offset as you mentioned, and would not be affected by the virtual page to physical frame translation procedure.
See Figure 1 of the paper Theory and Practice of Finding Eviction Sets for a nice visualization on different page sizes' bit layout.
That being clarified,
Then, if we use VIPT cache, how the tag can be identified if computer use variable page size (i.e. variable page offset bit)?
as far as the CPU cache is concerned, the page size does not make a difference in retrieving the tag. To begin with, the MMU does not inherently know whether a virtual address is a part of a base page or a huge page. It's after a TLB look up or a page walk (upon miss in all TLB levels), where the type of the page becomes clear. Once the physical address is resolved, the cache's tag comparison determines a cache hit or miss in the given cache level. With a 2MiB huge page, there is log2(2MiB) = 21
bits for the page offset, which is again not sufficient to expose all higher-order bits required for a tag comparison. Therefore, the page translation procedure needs to be completed the same way (although it could potentially be faster for a huge page) regardless of page size, to obtain the tag.