During my browsing I came across of a thing called hugepages, hugepages mechanism makes it possible to map 2M and even 1G pages using entries in the second and the third level page tables, and as the kernel docs itself says that the:
Usage of huge pages significantly reduces pressure on TLB, improves TLB hit-rate and thus improves overall system performance.
I browsed the kernel source as well and I didn't see any usage of MAP_HUGETLB
when it comes to mmap
. In fact, /proc/sys/vm/nr_hugepages
is set to 0
by default. Why is that? Does it mean the kernel has no need in huge pages at all? What are some examples of scenarios where huge pages are a must?
For the sake of example:
hugepage = mmap(0, getpagesize() * 4, PROT_WRITE | PROT_READ,
MAP_ANON | MAP_HUGETLB | MAP_PRIVATE, 0, 0);
The Linux kernel's approach to huge pages is to mainly let system administrators manage them from userspace. This is mostly because as cool as they might sound, huge pages can also have drawbacks: for example, they cannot be swapped to disk. This LWN series on huge pages gives a lot of information on the topic.
By default there are no huge pages reserved, and one can reserve them at boot time through the boot parameters hugepagesz=
and hugepages=
(specified multiple times for multiple huge page sizes). Huge pages can also be reserved at runtime through /proc/sys/vm/nr_hugepages
and /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages
. Furthermore, they can be "dynamically" reserved by the kernel if .../nr_overcommit_hugepages
is set higher than .../nr_hugepages
. These numbers are reflected in /proc/meminfo
under the various HugePages_XXX
stats, which are for the default huge page size (Hugepagesize
).
File-backed mappings only support huge pages if the file resides in a hugetlbfs
filesystem, and only of the specific size specified at mount time (mount option pagesize=
). The hugeadm
command-line tool, among other things, can give info about currently mounted hugetlbfs
FSs with --list-all-mounts
. One major reason for wanting a hugetlbfs
mounted on your system is to enable huge page support in QEMU/libvirt guests.
All of the above covers "voluntary" huge pages allocations done with MAP_HUGETLB
.
Linux also supports transparent huge pages (THP). Normal pages can be transparently made huge (or vice-versa existing transparent huge pages can be broken into normal pages) when needed by the kernel. This is without the need for MAP_HUGETLB
, and regardless of nr_hugepages
in sysfs.
There are some sysfs knobs to control THPs too. The most notable one being /sys/kernel/mm/transparent_hugepage/enabled
: always
means that the kernel will try to create THPs even without userspace programs actively suggesting it; madvise
means that it will do so only if userspace programs suggests it through madvise(addr, len, MADV_HUGEPAGE)
; never
means they are disabled. You'll probably see this set to always
by default in modern Linux distros e.g. recent releases of Debian or Ubuntu.
As an example, doing mmap(0x123 << 21, 2*1024*1024, 7, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
with /sys/kernel/mm/transparent_hugepage/enabled
set to always
should result in a 2M transparent huge page since the requested mapping is aligned to 2M (notice the absence of MAP_HUGETLB
).
Does it mean the kernel has no need in huge pages at all? What are some examples of scenarios where huge pages are a must?
In general, you don't really need huge pages of any kind, you can very well live without them. They are just an optimization. Scenarios where they can be useful are, as mentioned by @Mgetz in the comments above, cases where you have a lot of random memory accesses on very large files (common for databases). Minimizing TLB pressure in such cases can result in significant performance improvements.