linux-kernelallocationdmacontiguous

How to allocate large contiguous, memory regions in Linux


Yes, I will ultimately be using this for DMA but lets leave coherency aside for the moment. I have 64 bit BAR registers, therefore, AFAIK, all of RAM (e.g. higher than 4G) is available for DMA.

I am looking for about 64MB of contiguous RAM. Yes, that's a lot.

Ubuntu 16 and 18 have CONFIG_CMA=y but CONFIG_DMA_CMA is not set at kernel compile time.

I note that if both were set (at Kernel build time) I could simply call dma_alloc_coherent, however, for logistical reasons, it is undesirable to recompile the kernel.

The machines will always have at least 32GB of RAM, do not run anything RAM intensive, and the kernel module will load shortly after boot before RAM becomes significantly fragmented and, AFAIK, nothing else is using the CMA.

I have set the kernel parameter CMA=1G. (and have tried 256M and 512M)

# dmesg | grep cma
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.170 root=UUID=2b25933c-e10c-4833-b5b2-92e9d3a33fec ro cma=1G
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.170 root=UUID=2b25933c-e10c-4833-b5b2-92e9d3a33fec ro cma=1G
[    0.000000] Memory: 65612056K/67073924K available (8604K kernel code, 1332K rwdata, 3972K rodata, 1484K init, 1316K bss, 1461868K reserved, 0K cma-reserved)

I have tried alloc_pages(GFP_KERNEL | __GFP_HIGHMEM, order), no joy.

And finally the actual question: How does one get large contiguous blocks from the CMA? Everything I have found online suggests the use of dma_alloc_coherent but I know this only works with CONFIG_CMA=y and CONFIG_DMA_CMA=yes.

The module source, tim.c

#include <linux/module.h>       /* Needed by all modules */
#include <linux/kernel.h>       /* Needed for KERN_INFO */
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/gfp.h>
unsigned long big;
const int order = 15;
static int __init tim_init(void)
{
    printk(KERN_INFO "Hello Tim!\n");
    big = __get_free_pages(GFP_KERNEL | __GFP_HIGHMEM, order);
    printk(KERN_NOTICE "big = %lx\n", big);
    if (!big)
        return -EIO; // AT&T

    return 0; // success
}

static void __exit tim_exit(void)
{
    free_pages(big, order);
    printk(KERN_INFO "Tim says, Goodbye world\n");
}

module_init(tim_init);
module_exit(tim_exit);
MODULE_LICENSE("GPL");

Inserting the module yields...

# insmod tim.ko
insmod: ERROR: could not insert module tim.ko: Input/output error
# dmesg | tail -n 33

[  176.137053] Hello Tim!
[  176.137056] ------------[ cut here ]------------
[  176.137062] WARNING: CPU: 4 PID: 2829 at mm/page_alloc.c:3198 __alloc_pages_nodemask+0xd14/0xe00()
[  176.137063] Modules linked in: tim(OE+) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables configfs vxlan ip6_udp_tunnel udp_tunnel uio pf_ring(OE) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm mei_me mei irqbypass sb_edac ioatdma edac_core shpchp serio_raw input_leds lpc_ich dca acpi_pad 8250_fintek mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear
[  176.137094]  hid_generic usbhid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel e1000e aesni_intel raid1 aes_x86_64 isci lrw libsas ahci gf128mul ptp glue_helper ablk_helper cryptd psmouse hid libahci scsi_transport_sas pps_core wmi fjes
[  176.137105] CPU: 4 PID: 2829 Comm: insmod Tainted: G           OE   4.4.170 #1
[  176.137106] Hardware name: Supermicro X9SRL-F/X9SRL-F, BIOS 3.3 11/13/2018
[  176.137108]  0000000000000286 8ba89d23429d5749 ffff88100f5cba90 ffffffff8140a061
[  176.137110]  0000000000000000 ffffffff81cd89dd ffff88100f5cbac8 ffffffff810852d2
[  176.137112]  ffffffff821da620 0000000000000000 000000000000000f 000000000000000f
[  176.137113] Call Trace:
[  176.137118]  [<ffffffff8140a061>] dump_stack+0x63/0x82
[  176.137121]  [<ffffffff810852d2>] warn_slowpath_common+0x82/0xc0
[  176.137123]  [<ffffffff8108541a>] warn_slowpath_null+0x1a/0x20
[  176.137125]  [<ffffffff811a2504>] __alloc_pages_nodemask+0xd14/0xe00
[  176.137128]  [<ffffffff810ddaef>] ? msg_print_text+0xdf/0x1a0
[  176.137132]  [<ffffffff8117bc3e>] ? irq_work_queue+0x8e/0xa0
[  176.137133]  [<ffffffff810de04f>] ? console_unlock+0x20f/0x550
[  176.137137]  [<ffffffff811edbdc>] alloc_pages_current+0x8c/0x110
[  176.137139]  [<ffffffffc0024000>] ? 0xffffffffc0024000
[  176.137141]  [<ffffffff8119ca2e>] __get_free_pages+0xe/0x40
[  176.137143]  [<ffffffffc0024020>] tim_init+0x20/0x1000 [tim]
[  176.137146]  [<ffffffff81002125>] do_one_initcall+0xb5/0x200
[  176.137149]  [<ffffffff811f90c5>] ? kmem_cache_alloc_trace+0x185/0x1f0
[  176.137151]  [<ffffffff81196eb5>] do_init_module+0x5f/0x1cf
[  176.137154]  [<ffffffff81111b05>] load_module+0x22e5/0x2960
[  176.137156]  [<ffffffff8110e080>] ? __symbol_put+0x60/0x60
[  176.137159]  [<ffffffff81221710>] ? kernel_read+0x50/0x80
[  176.137161]  [<ffffffff811123c4>] SYSC_finit_module+0xb4/0xe0
[  176.137163]  [<ffffffff8111240e>] SyS_finit_module+0xe/0x10
[  176.137167]  [<ffffffff8186179b>] entry_SYSCALL_64_fastpath+0x22/0xcb
[  176.137169] ---[ end trace 6aa0b905b8418c7b ]---
[  176.137170] big = 0

curiously, trying it again yields...

# insmod tim.ko
insmod: ERROR: could not insert module tim.ko: Input/output error
...and dmesg just shows:

[  302.068396] Hello Tim!
[  302.068398] big = 0

why no stack dump the second (and subsequent) try(s)?


Solution

  • The short version is that __GFP_DIRECT_RECLAIM (also provided by __GFP_RECLAIM) is necessary as dma_alloc_contiguous is eventually called and it checks, via a call to gfpflags_allow_blocking, that blocking is okay. I used the usual GFP_KERNEL which provides __GFP_RECLAIM | __GFP_IO | __GFP_FS. But before all that one must call dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64)) with DMA_BIT_MASK(64) not DMA_BIT_MASK(32).

    err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
    if (err) {
        printk(KERN_INFO "[%s:probe] dma_set_mask returned: %d\n", DRIVER_NAME, err);
        return -EIO;
    }
    vaddr = dma_alloc_coherent(&pdev->dev, dbsize, paddr, GFP_KERNEL);
    if (!vaddr) {
        printk(KERN_ALERT "[%s:probe] failed to allocate coherent buffer\n", DRIVER_NAME);
        return -EIO;
    }
    
    iowrite32(paddr, ctx->bar0_base_addr + 0x140); // tell card where to DMA from
    

    Allocating Unreasonably Large DMA Regions Using the CMA with Ubuntu 16.04 & 18.04:

    1. Rebuild Kernel

      1. Use uname -r to ascertain your current kernel version
      2. Issue apt install linux-source-$(uname -r) to fetch the kernel source
      3. copy /boot/config-$(uname -r) to /usr/src/linux-source-$(uname -r)/.config
      4. edit .config
        1. Locate CONFIG_DMA_CMA is not set
        2. change to CONFIG_DMA_CMA=y
      5. build kernel
        1. make -j[2 × # of cores]
        2. make -j[2 × # of cores] modules
        3. make install
      6. You have rebuilt the kernel
    2. Configure CMA to reserve RAM

      1. Edit /etc/defualt/grub
        1. Locate GRUB_CMDLINE_LINUX=""
        2. Change to GRUB_CMDLINE_LINUX="cma=33G"
        3. use your desired CMA reserved RAM in place of 33G
      2. Issue update-grub
      3. Reboot
      4. Issue dmesg | grep cma
        1. Look for Memory: 30788784K/67073924K available (14339K kernel code, 2370K rwdata, 4592K rodata, 2696K init, 5044K bss, 1682132K reserved, 34603008K cma-reserved
        2. note: This example reserves 33G
      5. You have configured CMA to hold back RAM from the normal allocation subsystems
    3. Alter your kernel module (driver) source

      1. Inform the kernel that the card can address 64b
      2. In your probe function locate a line like dma_alloc_coherent(…
      3. A few lines before that you may find dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))
      4. change this to dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))
      5. You have informed the kernel that the card in question is not restricted to low memory
      6. dma_alloc_coherent(&pdev->dev, dbsize, paddr, GFP_KERNEL)
      7. dbsize may specify up to 32G
      8. Recompile your kernel module (driver) and test