kubernetescontainersout-of-memoryqos

Kubernetes Pod vs. Container OOMKilled


If I understand correctly the conditions for Kubernetes to OOM kill a pod or container (from komodor.com):

If a container uses more memory than its memory limit, it is terminated with an OOMKilled status. Similarly, if overall memory usage on all containers, or all pods on the node, exceeds the defined limit, one or more pods may be terminated.

This means that if a container in the pod exceeds the total memory it will be killed (the container) but not the pod itself. Similarly, if there are multiple containers in a pod and the pod itself exceeds its memory limitation, which is the sum of memory limits of all the containers in that pod - the pod will be OOM killed. However, the latter only seems possibly if one of the containers exceeds its memory allowance. In this case - wouldn't the container be killed first?

I'm trying to understand the actual conditions in which a pod is OOM killed instead of a container.

I've also noticed that when there is one container in the pod and that container is exceeding its memory allowance repeatedly - the pod and container are killed intermittently. I observed this - the container would restart, which would be observable by watching the logs from the pod, and every second time - the pod is killed and restarted, incrementing its restart count.

If it helps to understand the behavior - the QOS class of the pod is Burstable.


Solution

  • Pods aren't OOM killed at all. OOMKilled is a status ultimately caused by a kernel process (OOM Killer) that kills processes (containers are processes), which is then recognised by the kubelet which sets the status on the container. If the main container in a pod is killed then by default the pod will be restarted by the kubelet. A pod cannot be terminated, because a pod is a data structure rather than a process. Similarly, it cannot have a memory (or CPU) limit itself, rather it is limited by the sum of its component parts.

    The article you reference uses imprecise language and I think this is causing some confusion. There is a better, shorter, article on medium that covers this more accurately, and a longer and much more in depth article here.