I have a github action that runs on multiple self-hosted runners, and it runs multiple times a day. As of now, if for any reason one of the runners is not found (i.e. the runner isn't currently running, for whatever reason), I want that job to fail, rather than continue to wait infinitely for that runner to be available.
I've tried the following code in my workflow, but timeout-minutes only seems to take effect after the runner has started:
timeout-minutes: 15
runs-on: {matrix.runners}
is there some way to cause the job to fail if the runner isn't found after a time?
This is not possible at the moment (3/2024). See this discussion for details.
As a workaround, several users on the thread have created a solution for cancelling stuck jobs. This script can run in a separate scheduled workflow.
gh -R {owner}/{repo} run list -w cleanup.yml -s queued --json databaseId -q '.[].databaseId' | sort -nr | tail -n +2 | xargs -r -n 1 gh -R {owner}/{repo} run cancel
Replace {owner}/{repo}
with your own repository.