kuberneteskubeletcrashloopbackoff

getting CrashLoopBackOff Error for 1/4 pods due to Error syncing pod


I am getting CrashLoopBackOff Error for 1/4 pods, please guide me on how to troubleshoot this issue.

$kubectl get pod -n cog-prod01 -o wide

slotmachine-1688723297-5vlht          1/1       Running            0          21h       100.96.6.15     ip-172-21-61-42.compute.internal
slotmachine-1688723297-6plr9          1/1       Running            0          16h       100.96.13.16    ip-172-21-54-247.compute.internal
slotmachine-1688723297-k995t          1/1       Running            0          16h       100.96.11.186   ip-172-21-60-180.compute.internal
slotmachine-1688723297-sk8bn          0/1       CrashLoopBackOff   8          19m       100.96.2.72     ip-172-21-56-148.compute.internal

Kubelet logs on the node:

admin@ip-172-21-56-148:~$ journalctl -u kubelet -f

    Jan 07 02:44:36 ip-172-21-56-148 kubelet[1568]: W0107 02:44:36.351880    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: W0107 02:44:46.372270    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443776    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.443851    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:44:46 ip-172-21-56-148 kubelet[1568]: I0107 02:44:46.592800    1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerStarted", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
    Jan 07 02:44:56 ip-172-21-56-148 kubelet[1568]: W0107 02:44:56.409374    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.669027    1568 kubelet.go:1917] SyncLoop (PLEG): "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)", event: &pleg.PodLifecycleEvent{ID:"2bc8665e-30f5-11ea-a92d-024aeca0bafc", Type:"ContainerDied", Data:"5b2868d22c3e5453e57a58cba78cea4979a7da9a0864be2f29049d47d19fa41b"}
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971547    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b3aa.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971640    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: I0107 02:45:00.971770    1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
    Jan 07 02:45:00 ip-172-21-56-148 kubelet[1568]: E0107 02:45:00.971805    1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:06 ip-172-21-56-148 kubelet[1568]: W0107 02:45:06.447068    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.149685    1568 status_manager.go:418] Status for pod "2bc8665e-30f5-11ea-a92d-024aeca0bafc" is up-to-date; skipping
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.443951    1568 kuberuntime_manager.go:463] Container {Name:slotmachine Image:gt/slotmachine:develop.6590.b35a.2866 Command:[] Args:[] WorkingDir: Ports:[{Name:slotmachine HostPort:0 ContainerPort:9192 Protocol:TCP HostIP:}] EnvFrom:[{Prefix: ConfigMapRef:&ConfigMapEnvSource{LocalObjectReference:LocalObjectReference{Name:global,},Optional:nil,} SecretRef:nil}] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:5 scale:9} d:{Dec:<nil>} s:5G Format:DecimalSI}]} VolumeMounts:[{Name:slotmachine-logs ReadOnly:false MountPath:/var/log/slotmachine SubPath:} {Name:default-token-9bxjf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath:}] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444070    1568 kuberuntime_manager.go:747] checking backoff for container "slotmachine" in pod "slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: I0107 02:45:12.444198    1568 kuberuntime_manager.go:757] Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)
    Jan 07 02:45:12 ip-172-21-56-148 kubelet[1568]: E0107 02:45:12.444238    1568 pod_workers.go:182] Error syncing pod 2bc8665e-30f5-11ea-a92d-024aeca0bafc ("slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"), skipping: failed to "StartContainer" for "slotmachine" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=slotmachine pod=slotmachine-1688723297-sk8bn_cog-prod01(2bc8665e-30f5-11ea-a92d-024aeca0bafc)"
    Jan 07 02:45:13 ip-172-21-56-148 kubelet[1568]: I0107 02:45:13.938976    1568 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
    Jan 07 02:45:16 ip-172-21-56-148 kubelet[1568]: W0107 02:45:16.464693    1568 helpers.go:793] eviction manager: no observation found for eviction signal allocatableNodeFs.available

admin@ip-172-21-43-86:~$ kubectl describe po -n cog-prod01 slotmachine-1688723297-sk8bn

Events:
  FirstSeen     LastSeen        Count   From                                                            SubObjectPath                   Type            Reason                  Message
  ---------     --------        -----   ----                                                            -------------                   --------        ------                  -------
  27m           27m             1       default-scheduler                                                                               Normal          Scheduled               Successfully assigned slotmachine-1688723297-sk8bn to ip-172-21-56-148.compute.internal
  27m           27m             1       kubelet, ip-172-21-56-148.compute.internal                                       Normal          SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "slotmachine-logs"
  27m           27m             1       kubelet, ip-172-21-56-148.compute.internal                                       Normal          SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "default-token-9bxjf"
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Pulled                  Container image "gt/slotmachine:develop.6590.xxxx.2866" already present on machine
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Created                 Created container
  27m           4m              10      kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Normal          Started                 Started container
  27m           11s             113     kubelet, ip-172-21-56-148.compute.internal       spec.containers{slotmachine}    Warning         BackOff                 Back-off restarting failed container
  27m           11s             113     kubelet, ip-172-21-56-148.compute.internal                                       Warning         FailedSync              Error syncing pod

Note: Checked disk space, CPU, memory on the node running that pod it's fine. According to pod logs, it's not able to connect config service but then other 3 are able to connect to this service so not able to figure it out what is wrong here!

admin@ip-172-21-43-86:~$ kubectl logs -n  cog-prod01 slotmachine-1688723297-sk8bn


03:01:02.104 [main] INFO  org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Fetching config from server at: http://configservice:8888
03:01:05.344 [main] WARN  org.springframework.cloud.config.client.ConfigServicePropertySourceLocator - Could not locate PropertySource: I/O error on GET request for "http://configservice:8888/slotmachine/cog,cog-prod01": No route to host (Host unreachable); nested exception is java.net.NoRouteToHostException: No route to host (Host unreachable)
03:01:05.381 [main] INFO  org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext - Refreshing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@77eca502: startup date [Tue Jan 07 03:01:05 UTC 2020]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@4fb0f2b9

Solution

  • Not enough capacity is available on the node or nodes so scheduler is not able to deploy your 4th pod. You may check this with kubectl describe nodes. For detailed explanation, have a look at my answer to GKE Insufficient CPU for small Node.js app pods