linuxdockercgroups

How to disable the oom killer in linux?


My current configs are:

> cat /proc/sys/vm/panic_on_oom
0
> cat /proc/sys/vm/oom_kill_allocating_task
0
> cat /proc/sys/vm/overcommit_memory
1

but when I run a task, it's killed anyway.

> ./test/mem.sh
Killed
> dmesg | tail -2
[24281.788131] Memory cgroup out of memory: Kill process 10565 (bash) score 1001 or sacrifice child
[24281.788133] Killed process 10565 (bash) total-vm:12601088kB, anon-rss:5242544kB, file-rss:64kB

Update

My tasks are used to scientific computing, which costs many memories, it seems that overcommit_memory=1 may be the best choice.

Update 2

Actually, I'm working on a data analyzation project, which costs memory more than 16G, but I was asked to limit them in about 5G. It might be impossible to implement this requirement via optimizing the program itself, because the project uses many sub-commands, and most of them does not contains options like Xms or Xmx in Java.

Update 3

My project should be an overcommited system. Exacetly as what a3f saying, it seems that my apps prefer to crash by xmalloc when mem allocated failed.

> cat /proc/sys/vm/overcommit_memory
2
> ./test/mem.sh
./test/mem.sh: xmalloc: .././subst.c:3542: cannot allocate 1073741825 bytes (4295237632 bytes allocated)

I don't want to surrender, although so many aweful tests make me exhausted. So please show me a way to the light ; )

HACK TO FIX

Here's a hack to make it last as long as possible.

while true
do
pid=`pgrep programname`
echo \-17 > /proc/${pid}/oom_adj
sleep 1
done

Solution

  • The OOM killer won't go away. If there is no memory, someone's got to pay. What you can do is set a limit after which memory allocations fail. That's exactly what setting vm.overcommit_memory to 2 achieves.

    From the docs:

    The Linux kernel supports the following overcommit handling modes

    2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate.

    Normally, the kernel will happily hand out virtual memory (overcommit). Only when you reference a page, the kernel has to map the page to a real physical frame. If it can't service that request, a process needs to be killed by the OOM killer to make space.

    Disabling overcommit means that e.g. malloc(3) will return NULL if the kernel couldn't commit the amount of memory requested. This makes things a bit more predictable, albeit limited (many applications allocate more than they would ever need).