clinuxlinux-kernelx86memory-mapping

How to map 1GB (or more) of physical memory under a 32-bit Linux kernel


I have a setup with 2GB of memory and I would like to map 1GB (or more) of physical memory into user space virtual address. It is in theory possible since with 32-bit setup, 3GB of virtual address is available to user land apps.

I updated the kernel command line with the following parameters: mem=1G memmap=1G$1G to force the kernel to see 1GB of RAM and to reserve the last 1GB.

I have my custom driver that will handle the user space mmap() call and map the physical address 0x40000000 (1G) to user space address with the function remap_pfn_range().

But the function triggers a kernel BUG() in remap_pte_range(). The same call used to work with a 300MB remap instead of 1GB.

I usually used to call ioremap() in my driver to map physical address into kernel virtual address. In this case, I can't because of 1G/3G virtual addresses split (1G for kernel, 3G for apps). So I was wondering if it is possible to map physical address into user space virtual address without mapping these physical address in the kernel?

This is a 32-bit x86 kernel, i.e. "i386" architecture.


Solution

  • Why does your remap_pfn_range call trigger a kernel BUG()

    The call to the BUG_ON macro in remap_pfn_range as per here

    2277 BUG_ON(addr >= end);

    remap_pfn_range calls remap_pud_range which calls remap_pmd_range which calls remap_pte_range.

    Subsequent calls to BUG_ON or VM_BUG_ON from remap_pmd_range here

    2191 VM_BUG_ON(pmd_trans_huge(*pmd));

    and from remap_pte_range here

    2171 BUG_ON(!pte_none(*pte));

    BUG_ON macro is defined here

    as

    #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while(0)

    where BUG macro is defined above it to print a message and panic.

    unlikely macro is defined here

    as # define unlikely(x) (__builtin_expect(!!(x), 0)).

    So when the target user address to start at addr is greater than or equal to end which is defined as end = addr + PAGE_ALIGN(size);, BUG_ON returns 1 and calls BUG.

    Or when pmd_trans_huge as defined here

    153 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
    154 static inline int pmd_trans_splitting(pmd_t pmd)
    155 {
    156         return pmd_val(pmd) & _PAGE_SPLITTING;
    157 }
    158 
    159 static inline int pmd_trans_huge(pmd_t pmd)
    160 {
    161         return pmd_val(pmd) & _PAGE_PSE;
    162 }
    163 
    164 static inline int has_transparent_hugepage(void)
    165 {
    166         return cpu_has_pse;
    167 }
    

    returns 0, this occurs when CONFIG_TRANSPARENT_HUGEPAGE isn't configured in the kernel or if the pmd (Page Metadate) value or & _PAGE_PSE

    Or whenpte_none returns 1 if the corresponding entry does not exist and 0 if it exists.

    Therefore !pte_none returns 0 when the corresponding page table entry does not exist and 1 other wise as the condition passed into BUG_ON.

    If the page table entry already exists then the call to BUG macro occurs.

    What happens if you specify a lower a amount of memory than !GB that is greater than 300MB , say 500MB or 800MB ?

    So either your starting address is greater than your ending address, or you CONFIG_TRANSPARENT_HUGEPAGE isn't configured in the kernel or you are referring to Page Metadata doesn't exist or Page Table entries that already exist.

    Clarifying from the comments, your call to remap_pfn_range references Page Table Entry pointers or *ptethat are already pointing to a page table entry or pte.

    This means that set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); would fail as the pte pointer already points to a page table entry and hence can't be set to the pte that is pte_mkspecial(pfn_pte(pfn, prot)).

    Bypassing the 1G /3G virtual address split

    See the following article High Memory In The Linux Kernel

    See the following mailing list post, which discusses some additional information about HIGHMEM with a minimum of 1GB of RAM.

    Information on mapping kernel and non kernel virtual address space to user land

    One way to map kernel virtual addresses and non kernel (returned by vmalloc()) virtual addresses to userspace is using remap_pfn_range. See Linux Memory Mapping for additional information.

    Another way that replaced the usage of the nopage handler on older kernels is the vm_insert_page function

    Additional Resources include: