I have multiple git repositories on a company gitlab and wanted to clean them up using git-gilter-repo, following the steps from documentation:
https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#DISCUSSION
git fetch --all
branches and tagsgit-filter-repo --analyze --force
and review path-all-sizes.txt
paths_to_keep.txt
and path_to_delete.txt
specifying which directories I want to keep or delete, including some globs.git-filter-repo --paths-from-file path_to_keep.txt
and similar with --invert-paths
for the paths to delete. Make sure to keep a copy of all the commit-map
files.git-filter-repo --analyze
and make sure all large files are gonegit push --force --all
to upload all branchesThe GitLab documentation mentions additional steps: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html
git push origin --force 'refs/tags/*'
to upload tagsgit push origin --force 'refs/replace/*'
to let tags point to new hashesFirst repo, I followed steps 1-6 and it worked as expected, the size was dramatically decreased and all branches and tags were there, pointing to the new hashes. Success!
Second repository, the size was decreased, all branches were created, all fine up to step 4. But then when I execute step 5, all the large files are attached to the tags again, and the repo is large again. When browsing the tags from the GitLab UI, I can see the large files. After executing step 6, the files are no longer visible, but the repo size is still large.
Anybody have an idea what could have gone wrong in the second case? I understand I could use steps 6-8 to remove the files, but why are they even added in the 2nd repo (but not for the 1st)??
It seems there were several tags pointing to unreachable objects. The solution is to modify step 1. Cloning will grab all branches and tags, including the ones pointing to unreachable objects. Instead, do a git init, then add remote, and fetch --all
and fetch --tags
, as outlined in this SO question:
Why is a cloned repo 10x larger than a fetched repo?
After that, execute steps 1-5 of the original question.