We maintain a clone of our primary Git server for scanning and indexing content. Our "clone" is a collection of bare git repositories.
When we begin to execute one of our cycles, each bare repository is updated with a git fetch
operation. Any branches which received commits since the last scan are then git checkout
to a working directory. This allows us to have multiple copies of the code (work trees), each on a different branch, available for processing while maintaining a single copy of the git repository. The work trees are cleaned up after our processing completes.
We have some very active and large repositories where the checkout
operation can take upwards of thirty minutes. If there are ten branches to be scanned, they are checked out sequentially: git --work-tree=/path/to/repo.git --git-dir=/work/path/branch1 checkout -f branch1
, followed by git --work-tree=/path/to/repo.git --git-dir=/work/path/branch2 checkout -f branch2
, etc. (Note that we work with distinct commit IDs rather than branches, but the concept holds!) Sequential operation is required because git checkout creates an index.lock file in the bare repository. This can take (10 branches * 30 minutes) upwards of five hours.
Is there any way the checkout operations can be parallelized, bypassing the index.lock file?
The git worktree add
command will allow parallel operation. When you
git clone --bare <URL> repo.git
cd repo.git
then
git worktree add ../repo-branch1 branch1
and
git worktree add ../repo-branch2 branch2
can be executed in parallel.