python-2.7gdbout-of-memorysigkill

Python script terminated by SIGKILL rather than throwing MemoryError


Update Again

I have tried to create some simple way to reproduce this, but have not been successful.

So far, I have tried various simple array allocations and manipulations, but they all throw an MemoryError rather than just SIGKILL crashing.

For example:

x =np.asarray(range(999999999))

or:

x = np.empty([100,100,100,100,7])

just throw MemoryErrors as they should.

I hope to have a simple way to recreate this at some point.

End Update

I have a python script running numpy/scipy and some custom C extensions.

On my Ubuntu 14.04 under Virtual Box, it runs to completion just fine.

On an Amazon EC2 T2 micro instance, it terminates (after running a while) with the output:

Killed

Running under the python debugger, the signal is not caught and the debugger exits as well.

Running under strace, I get:

munmap(0x7fa5b7fa6000, 67112960)        = 0
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5b7fa6000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5affa4000    
mmap(NULL, 67112960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5abfa3000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a7f22000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa5a3ea1000    
mmap(NULL, 67637248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa59fe20000    
gettimeofday({1406518336, 306209}, NULL) = 0    
gettimeofday({1406518336, 580022}, NULL) = 0    
+++ killed by SIGKILL +++

running under gdb while trying to catch "SIGKILL", I get:

[Thread 0x7fffe7148700 (LWP 28022) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) where
No stack.

running python's trace module (python -m trace --trace ), I get:

defmatrix.py(292):         if (isinstance(obj, matrix) and obj._getitem): return
defmatrix.py(293):         ndim = self.ndim
defmatrix.py(294):         if (ndim == 2):
defmatrix.py(295):             return
defmatrix.py(336):         return out
 --- modulename: linalg, funcname: norm
linalg.py(2052):     x = asarray(x)
 --- modulename: numeric, funcname: asarray
numeric.py(460):     return array(a, dtype, copy=False, order=order)

I can't think of anything else at the moment to figure out what is going on.

I suspect maybe it might be running out of memory (it is an AWS Micro instance), but I can't figure out how to confirm or deny that.

Is there another tool I could use that might help pinpoint exactly where the program is stopping? (or I am running one of the above tools the wrong way for this problem?)

Update

The Amazon EC2 T2 micro instance has no swap space defined by default, so I added a 4GB swap file and was able to run the program to completion.

However, I am still very interested in a way to have run the program such that it terminated with some message a little closer to "Not Enough Memory" rather than "Killed"

If anyone has any suggestions, they would be appreciated.


Solution

  • It sounds like you've run into the dreaded Linux OOM Killer. When the system completely runs of out of memory and the kernel absolutely needs to allocate memory, it kills a process rather than crashing the entire system.

    Look in the syslog for confirmation of this. A line similar to:

    kernel: [884145.344240] mysqld invoked oom-killer:
    

    followed sometime later with:

    kernel: [884145.344399] Out of memory: Kill process 3318
    

    Should be present (in this example, it mentions mysql specifically)

    You can add these lines to your /etc/sysctl.conf file to effectively disable the OOM killer:

    vm.overcommit_memory = 2
    vm.overcommit_ratio = 100
    

    And then reboot. Now, the original, memory hungry, process should fail to allocate memory and, hopefully, throw the proper exception.

    Setting overcommit_memory means that Linux won't over commit memory, meaning memory allocations will fail if there isn't enough memory for them. See this answer for details on what effect the overcommit_ratio has: https://serverfault.com/a/510857