I have a cluster that includes a Cronjob scheduled to run every 5 minutes.
We recently experienced an issue that incurred downtime and required manual recovery of the cluster. Although now healthy again, this particular cronjob is failing to run with the following error:
Cannot determine if job needs to be started: Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew.
I understand that the Cronjob has 'missed' a number of scheduled jobs while the cluster was down, and this has past a threshold at which no further jobs will be scheduled.
How can I reset the number of missed start times and have these jobs scheduled again (without scheduling all the missed jobs to suddenly run?)
Per the kubernetes Cronjob docs, there does not seem to be a way to cleanly resolve this. Setting the .spec.startingDeadlineSeconds
value to a large number will re-schedule all missed occurrences that fall within the increased window.
My solution was just to kubectl delete cronjob x-y-z
and recreate it, which worked as desired.