I'm developing a microservices-based system where one of the microservices is a scheduler responsible for handling a large number of dynamic jobs. These jobs are created, modified, and deleted during runtime via the microservice's API.
My plan is to use Kubernetes CronJobs to manage these jobs. However, I'm concerned about the scalability and performance implications of potentially dealing with thousands of dynamically created CronJobs.
Is using Kubernetes CronJobs a recommended approach for efficiently managing a large number of dynamic jobs within a microservices architecture? If not, what alternative strategies or best practices should I consider for this use case? Any insights or recommendations from the community would be greatly appreciated.
I use CronJobs for many years, but for lower job volumes (max 10 - 20 scheduled jobs per day) and I didn't have any problems.
If the number of jobs goes up to thousands, I assume that:
Now apart of the performance aspects, when you schedule a job, you may need to manage the images (and maybe the env vars for each job), which can quickly turn into a nightmare unless you build a better way of managing the job configuration from your app.
So what I suggest is to create a PoC that validates/invalidates that approach. Schedule a number of jobs that progressively grows from 10 to 5000, with a dummy workload, and you'll see what happens, how big the K8s cluster should be, what issues may come up, etc.
The alternative is to embed this scheduling logic into the app itself, by using some scheduled services (lots of libs have this, depends on your tech stack). This pros is that you're going to have better observability and better control.