kubernetesephemeral-storage

Kubernetes Ephemeral Storage Limit and Container Logs


Assuming I have set resource.limits.ephemeral-storage for containers in a Kubernetes cluster (using Docker), and the following Docker daemon.json logging configuration on the worker nodes:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m",
    "max-file": "10",
  }
}

My understanding is that all log files (even the rotated log files) will count towards this ephemeral storage limit. This means that to determine the value for resource.limits.ephemeral-storage, I have to consider the maximum allowed log size (here 10*100MB) to the calculation.

Is there a way to "exclude" log files from counting towards the container's ephemeral-storage limit?

Since log handling is done "outside" of Kubernetes, I want to avoid that the resource limits for Kubernetes workloads depend on the Docker log configuration. Otherwise any change to the rotation settings (e.g. increase to 10*200MB) could cause pods to be evicted, if one would forget to adjust the limit for each and every container.


Solution

  • Based on the function calcEphemeralStorage from release 1.17.16 source code, if you want to exclude logs from calculation you can comment or remove those lines and rebuild kubelet:

    if podLogStats != nil {
            result.UsedBytes = addUsage(result.UsedBytes, podLogStats.UsedBytes)
            result.InodesUsed = addUsage(result.InodesUsed, podLogStats.InodesUsed)
            result.Time = maxUpdateTime(&result.Time, &podLogStats.Time)
        }
    

    This part of the code is responsible for counting ephemeral storage usage for logs. But removing that part of code may also require to adjust some test files which expect that logs amount is calculated. All statistics are instead counted in this function:

    func (p *criStatsProvider) makePodStorageStats(s *statsapi.PodStats, rootFsInfo *cadvisorapiv2.FsInfo) {
        podNs := s.PodRef.Namespace
        podName := s.PodRef.Name
        podUID := types.UID(s.PodRef.UID)
        vstats, found := p.resourceAnalyzer.GetPodVolumeStats(podUID)
        if !found {
            return
        }
        logStats, err := p.hostStatsProvider.getPodLogStats(podNs, podName, podUID, rootFsInfo)
        if err != nil {
            klog.ErrorS(err, "Unable to fetch pod log stats", "pod", klog.KRef(podNs, podName))
            // If people do in-place upgrade, there might be pods still using
            // the old log path. For those pods, no pod log stats is returned.
            // We should continue generating other stats in that case.
            // calcEphemeralStorage tolerants logStats == nil.
        }
        etcHostsStats, err := p.hostStatsProvider.getPodEtcHostsStats(podUID, rootFsInfo)
        if err != nil {
            klog.ErrorS(err, "Unable to fetch pod etc hosts stats", "pod", klog.KRef(podNs, podName))
        }
        ephemeralStats := make([]statsapi.VolumeStats, len(vstats.EphemeralVolumes))
        copy(ephemeralStats, vstats.EphemeralVolumes)
        s.VolumeStats = append(append([]statsapi.VolumeStats{}, vstats.EphemeralVolumes...), vstats.PersistentVolumes...)
        s.EphemeralStorage = calcEphemeralStorage(s.Containers, ephemeralStats, rootFsInfo, logStats, etcHostsStats, true)
    }
    

    In the last line you can find a usage of calcEphemeralStorage.

    In the recent version the mentioned code include the same log calculation section, so the solution should work for the latest release too.

    See also: