gitlabgitlab-ci-runner

GitLab Runner jobs fails after upgrade to 17.7.0


I have a GitLab CI/CD pipeline set up to build a Docker image with a Docker-In-Docker setup. One of my jobs now fails with:

Running with gitlab-runner 17.7.0 (3153ccc6)
  on ubuntu-worker-docker Gtyfr-Tz, system ID: s_02f37f5283e5
Preparing the "docker" executor
00:06
Using Docker executor with image docker:20.10.14 ...
Starting service docker:20.10.14-dind...
Pulling docker image docker:20.10.14-dind ...
Using docker image sha256:a072474332af3e4cf06e349685c4cea8f9e631f0c5cab5b582f3a3ab4cff9b6a for docker:20.10.14-dind with digest docker@sha256:210076c7772f47831afaf7ff200cf431c6cd191f0d0cb0805b1d9a996e99fb5e ...
Waiting for services to be up and running (timeout 30 seconds)...
Pulling docker image docker:20.10.14 ...
Using docker image sha256:7417809fdb730b60c1b903077030aacc708677cdf02f2416ce413f38e81ec7e0 for docker:20.10.14 with digest docker@sha256:41978d1974f05f80e1aef23ac03040491a7e28bd4551d4b469b43e558341864e ...
Preparing environment
00:00
Running on runner-gtyfr-tz-project-15-concurrent-0 via worker...
Getting source from Git repository
00:01
Fetching changes...
Reinitialized existing Git repository in /builds/test/example-project/.git/
error: cannot lock ref 'refs/remotes/origin/master': Unable to create '/builds/test/example-project/.git/refs/remotes/origin/master.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'. Please make sure all processes
are terminated then try again. If it still fails, a git process
may have crashed in this repository earlier:
remove the file manually to continue.
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1

How can I solve this?

I know that other solutions have been posted that tell me the Git commands to run, but where should I run them? There are also some GitLab discussions that suggest setting the Git strategy for pipelines to cloning instead of fetching, but this is hardly viable.

How can I clean up this job?


Solution

  • You need to manually edit the repository on the affected runner. SSH into the machine where the runner is installed, and become root.

    The Runner, because it is using the Docker executor, will have its files cached in some Docker volume. In /var/lib/docker/volumes, search for the affected Git repository.

    cd /var/lib/docker/volumes
    find . -type d -name 'example-project'
    # ./runner-gtyfr-tz-project-15-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8/_data/test/example-project
    

    This is where the repo is stored. Go into that directory and remove the lock file:

    rm .git/refs/remotes/origin/master.lock
    

    Now restart the job, and it should work again. Note that this may have to be repeated for other branches in the same repository, e.g., if you have open merge requests with their own CI pipelines.

    The reason? I don't know. Apparently some process has crashed, leaving a lock file. Or something else caused these repositories to end up in a locked up state — perhaps a GitLab upgrade.