I want to implement redundancy in my GitLab runners.
Before creating a new server I am trying with my local machine.
The current setup on my repository:
I want GitLab to chose the other runner when the selected is not working.
The thing is that GitLab is selecting the non-working runner and fail the pipeline without trying to run with the other runner.
How can I make this works?
Both runner are added:
But as the local runner (not working) is chosen, the pipeline fails:
This is an interesting edge case since the runner process itself is still healthy, but something while running a job is failing. The runner process won't know this happens until it retrieves a job and tries to run it, so it will keep try to run jobs, and keep failing.
Since neither the Runner process nor Gitlab can catch this edge case, the only option I can see is that when you see a failed job for this reason, pause the Runner in the project (like in your screenshot) or if you're an admin (or can ask an admin), pause it for the entire instance, assuming you're self-hosting Gitlab. This will prevent any new jobs from running on that Runner so you can troubleshoot the issue.
This will let you run multiple Runner processes on different hosts (or even on the same host by specifying separate config.toml
files) so you can still get redundancy and speed up your pipelines.
Some quick searching shows that common issues causing this issue are the runner's host running out of disk space, or a Docker issue that might be solved by updating to the latest version. Making sure Gitlab and the runners are the latest available version wouldn't hurt either.
Another option you have is to submit a new Issue with Gitlab to see if they can address it. The desired situation would be that in the event of a runner system failure, the runner should become unhealthy and not process further jobs.