We are running a daily cronjob on GKE. This job is executed on spot nodes. The container respects the SIGTERM
and gracefully shuts down. However, this is then marked as successful and not restarted. How can I ensure that this job is restarted on a different node?
I've read https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdown and https://kubernetes.io/docs/concepts/workloads/controllers/job/#handling-pod-and-container-failures, but I see nothing in there that helps me.
By default the cron jobs in kubernetes are not rescheduled after a node shutdown. However you can configure the job to use a restartPolicy
of OnFailure
to ensure that it is rescheduled after a node shutdown.
You need to mention the restartPolicy in spec sections as follows
apiVersion: batch/v1
kind: CronJob
metadata:
name: myjob
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: myjob
image: nginx
imagePullPolicy: IfNotPresent
restartPolicy: OnFailure
By using this restartPolicy, if a node is shutdown or the pod running the cron job terminates for any reason, the kubernetes scheduler will automatically reschedule the cronjob to run a healthy node.
Note: It is important to ensure that the cronjob required resources are available in the node.