memory-managementlinux-kernelkmallocvmalloc

vmalloc() allocates from vm_struct list


Kernel document https://www.kernel.org/doc/gorman/html/understand/understand010.html says, that for vmalloc-ing

It searches through a linear linked list of vm_structs and returns a new struct describing the allocated region.

Does that mean vm_struct list is already created while booting up, just like kmem_cache_create and vmalloc() just adjusts the page entries? If that is the case, say if I have a 16GB RAM in x86_64 machine, the whole ZONE_NORMAL i.e

16GB - ZONE_DMA - ZONE_DMA32 - slab-memory(cache/kmalloc)

is used to create vm_struct list?


Solution

  • That document is fairly old. It's talking about Linux 2.5-2.6. Things have changed quite a bit with those functions from what I can tell. I'll start by talking about code from kernel 2.6.12 since that matches Gorman's explanation and is the oldest non-rc tag in the Linux kernel Github repo.

    The vm_struct list that the document is referring to is called vmlist. It is created here as a struct pointer:

    struct vm_struct *vmlist;
    

    Trying to figure out if it is initialized with any structs during bootup took some deduction. The easiest way to figure it out was by looking at the function get_vmalloc_info() (edited for brevity):

    if (!vmlist) {
        vmi->largest_chunk = VMALLOC_TOTAL;
    }
    else {
        vmi->largest_chunk = 0;
        prev_end = VMALLOC_START;
    
        for (vma = vmlist; vma; vma = vma->next) {
            unsigned long addr = (unsigned long) vma->addr;
    
            if (addr >= VMALLOC_END)
                break;
    
            vmi->used += vma->size;
    
            free_area_size = addr - prev_end;
            if (vmi->largest_chunk < free_area_size)
                vmi->largest_chunk = free_area_size;
    
            prev_end = vma->size + addr;
        }
    
        if (VMALLOC_END - prev_end > vmi->largest_chunk)
            vmi->largest_chunk = VMALLOC_END - prev_end;
    }
    

    The logic says that if the vmlist pointer is equal to NULL (!NULL), then there are no vm_structs on the list and the largest_chunk of free memory in this VMALLOC area is the entire space, hence VMALLOC_TOTAL. However, if there is something on the vmlist, then figure out the largest chunk based on the difference between the address of the current vm_struct and the end of the previous vm_struct (i.e. free_area_size = addr - prev_end).

    What this tells us is that when we vmalloc, we look through the vmlist to find the absence of a vm_struct in a virtual memory area big enough to accomodate our request. Only then can it create this new vm_struct, which will now be part of the vmlist.

    vmalloc will eventually call __get_vm_area(), which is where the action happens:

        for (p = &vmlist; (tmp = *p) != NULL ;p = &tmp->next) {
            if ((unsigned long)tmp->addr < addr) {
                if((unsigned long)tmp->addr + tmp->size >= addr)
                    addr = ALIGN(tmp->size + 
                             (unsigned long)tmp->addr, align);
                continue;
            }
            if ((size + addr) < addr)
                goto out;
            if (size + addr <= (unsigned long)tmp->addr)
                goto found;
            addr = ALIGN(tmp->size + (unsigned long)tmp->addr, align);
            if (addr > end - size)
                goto out;
        }
    
    found:
        area->next = *p;
        *p = area;
    

    By this point in the function we have already created a new vm_struct named area. This for loop just needs to find where to put the struct in the list. If the vmlist is empty, we skip the loop and immediately execute the "found" lines, making *p (the vmlist) point to our struct. Otherwise, we need to find the struct that will go after ours.

    So in summary, this means that even though the vmlist pointer might be created at boot time, the list isn't necessarily populated at boot time. That is, unless there are vmalloc calls during boot or functions that explicitly add vm_structs to the list during boot as in future kernel versions (see below for kernel 6.0.9).

    One further clarification for you. You asked if ZONE_NORMAL is used for the vmlist, but those are two separate memory address spaces. ZONE_NORMAL is describing physical memory whereas vm is virtual memory. There are lots of resources for explaining the difference between the two (e.g. this Stack Overflow question). The specific virtual memory address range for vmlist goes from VMALLOC_START to VMALLOC_END. In x86, those were defined as:

    #define VMALLOC_START    0xffffc20000000000UL
    #define VMALLOC_END      0xffffe1ffffffffffUL
    

    For kernel version 6.0.9:

    The creation of the vm_struct list is here:

    static struct vm_struct *vmlist __initdata;
    

    At this point, there is nothing on the list. But in this kernel version there are a few boot functions that may add structs to the list:

       void __init vm_area_add_early(struct vm_struct *vm)
       void __init vm_area_register_early(struct vm_struct *vm, size_t align)
    

    As for vmalloc in this version, the vmlist is now only a list used during initialization. get_vm_area() now calls get_vm_area_node(), which is a NUMA ready function. From there, the logic goes deeper and is much more complicated than the linear search described above.