gitlabgitlab-cigitlab-ci-runnergitlab-ci.yml

Gitlab ci pipeline job tags for parallel run if enough runners or sequential run if not enough runners?


Our system currently has 3 runners on 3 machines with the same tag of acr:

runner: #1 / host: #1 / tag: acr
runner: #2 / host: #2 / tag: acr
runner: #3 / host: #3 / tag: acr

The gitlab ci pipeline has 3 stages with 1 job per stage:

stages:
  - stage-1
  - stage-2
  - stage-3

job-1:
  stage: stage-1
  tags:
    - acr
  script:
    - python run-script-1.py
  ...

job-2:
  stage: stage-2
  tags:
    - acr
  script:
    - python run-script-2.py
  ...

job-3:
  stage: stage-3
  tags:
    - acr
  script:
    - python run-script-3.py
  ...

Each job usually takes 7 mins to execute, making the whole pipeline take ~20 mins to complete.

As the 3 jobs are independent and can run in parallel, so we reassigned tags to the runners:

runner: #1 / host: #1 / tag: acr-1
runner: #2 / host: #2 / tag: acr-2
runner: #3 / host: #3 / tag: acr-3

The gitlab ci pipeline is also refactored so that it now has 1 stage with 3 jobs in the stage and each job is associated with a unique runner tag:

stages:
  - stage-all

job-1:
  stage: stage-all
  tags:
    - acr-1
  script:
    - python run-script-1.py
  ...

job-2:
  stage: stage-all
  tags:
    - acr-2
  script:
    - python run-script-2.py
  ...

job-3:
  stage: stage-all
  tags:
    - acr-3
  script:
    - python run-script-3.py
  ...

Now, if the 3 runners are all available, the ci pipeline takes ~7 mins to complete. But, the problem is that, in a bad day, 1 or 2 runners could be down for a while. This breaks the pipeline.

Is there a way to assign tags or arrange jobs so that, if enough runners are available the jobs will run concurrently, and if the runners are in shortage the jobs will run subsequently?


Solution

  • For those who are interested.

    We solve the issue by a combination of the below configs:

    With these settings, if there are enough available runners, jobs are distributed evenly, thus accelerating the pipeline. Otherwise, unpicked jobs are queued and will get executed sequentially once a runner completes its current work.