linuxlinux-kernelkernelramdmesg

What does memory allocation failure in Kernel Logs mean?


Here is the output of a dmesg command on an instance running Linux potentially dealing with memory crunch. Any help on what do these logs mean?


dmesg | tail -n 25
[23498.234294]  warn_alloc+0x114/0x1c0
[23498.238447] ena 0000:00:05.0 eth0: refilled rx qid 1 with only 64 buffers (from 131)
[23498.242537]  __alloc_pages_slowpath+0xce2/0xd20
[23498.242541]  ? ___slab_alloc+0xc1/0x4b0
[23498.242544]  ? get_page_from_freelist+0x525/0xba0
[23498.268528]  __alloc_pages_nodemask+0x25d/0x280
[23498.271780]  ena_refill_rx_bufs+0x55/0x2c0 [ena]
[23498.275046]  ena_clean_rx_irq+0x4ac/0x840 [ena]
[23498.278303]  ? netif_receive_skb_internal+0x42/0xe0
[23498.281698]  ena_io_poll+0x2d1/0x720 [ena]
[23498.284738]  net_rx_action+0x156/0x3f0
[23498.287680]  __do_softirq+0xe3/0x2c7
[23498.290553]  irq_exit+0xbd/0xd0
[23498.391684]  do_IRQ+0x89/0xe0
[23498.394364]  common_interrupt+0x85/0x85
[23498.397335]  </IRQ>
[23498.399697] RIP: 0033:0x7fb8c2cc8ad4
[23498.402642] RSP: 002b:00007f98402f3ea0 EFLAGS: 00000216 ORIG_RAX: ffffffffffffff73
[23498.408552] RAX: 00007fb8b5f52815 RBX: 00007f9a190e9678 RCX: 00007f986351ede0
[23498.412754] RDX: 0000000000000164 RSI: 00007fb812079d50 RDI: 00007f99dbfa83d4
[23498.416948] RBP: 0000000000001fae R08: 0000000000000164 R09: 0000000000000075
[23498.421138] R10: 00007fb812079d38 R11: 0000000000000074 R12: 00007faee54d9ba0
[23498.425357] R13: 0000000000000001 R14: 0000000000000007 R15: 00007fb8b5f52800
[23498.429598] ena 0000:00:05.0 eth0: failed to alloc buffer for rx queue 0
[23498.433666] ena 0000:00:05.0 eth0: refilled rx qid 0 with only 62 buffers (from 132)

Also, what are the potential ways to,

  1. do further root cause analysis?
  2. mitigate the problem?

Solution

  • The first part of the log is the function calling stack, we can see from them that this is related to ENA network driver, and when buddy system try to allocate a page, it fails because of lack of memory.

    From the second part, we know the exact message : "failed to alloc buffer for rx queue 0".

    After google, I find a blog which can he helpful for you. Here is the digest.

    This message will raise when the napi handler fails to refill new Rx descriptors, typically due to lack of memory. This situation might lead to performance decrease, given that some requests would have to be rescheduled.

    The solution for this is related to a memory increase on “min_free_kbytes” kernel parameter. As example:

    vm.min_free_kbytes = 1048576
    

    Place the following commands in /etc/sysctl.conf. And load the new setting with:

    sysctl -p
    

    It is recommended to have at least 512MB, with a minimum of 128MB for constrained environment. On large instance types running stress jobs (e.g. 64+ vCores + 256GiB + RAM), this value can typically be set to 10MB.