kubernetesgoogle-kubernetes-enginekubernetes-jobs

Automatically restart Kubernetes Job after node shutdown


We are running a daily cronjob on GKE. This job is executed on spot nodes. The container respects the SIGTERM and gracefully shuts down. However, this is then marked as successful and not restarted. How can I ensure that this job is restarted on a different node?

I've read https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown and https://kubernetes.io/docs/concepts/workloads/controllers/job/#handling-pod-and-container-failures, but I see nothing in there that helps me.


Solution

  • By default the cron jobs in kubernetes are not rescheduled after a node shutdown. However you can configure the job to use a restartPolicy of OnFailure to ensure that it is rescheduled after a node shutdown.

    You need to mention the restartPolicy in spec sections as follows

    apiVersion: batch/v1
    kind: CronJob
    metadata:
     name: myjob
    spec:
     schedule: "* * * * *"
     jobTemplate:
      spec:
       template:
        spec:
         containers:
           - name: myjob
             image: nginx
             imagePullPolicy: IfNotPresent
         restartPolicy: OnFailure
    

    By using this restartPolicy, if a node is shutdown or the pod running the cron job terminates for any reason, the kubernetes scheduler will automatically reschedule the cronjob to run a healthy node.

    Note: It is important to ensure that the cronjob required resources are available in the node.