gitlabgit-taggit-filter-repo

git-filter-repo did not filter tags


I have multiple git repositories on a company gitlab and wanted to clean them up using git-gilter-repo, following the steps from documentation:
https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#DISCUSSION

  1. Clone the old/big repo and git fetch --all branches and tags
    (Note: it is marked as archive in GitLab to make it read-only).
  2. Run git-filter-repo --analyze --force and review path-all-sizes.txt
  3. Create two txt files: paths_to_keep.txt and path_to_delete.txt specifying which directories I want to keep or delete, including some globs.
    Run git-filter-repo --paths-from-file path_to_keep.txt and similar with --invert-paths for the paths to delete. Make sure to keep a copy of all the commit-map files.
    Re-run git-filter-repo --analyze and make sure all large files are gone
  4. Add a new remote/origin, pointing to a new and fresh and empty repo,
    then git push --force --all to upload all branches

The GitLab documentation mentions additional steps: https://docs.gitlab.com/ee/user/project/repository/reducing_the_repo_size_using_git.html

  1. Run git push origin --force 'refs/tags/*' to upload tags
  2. Run git push origin --force 'refs/replace/*' to let tags point to new hashes
  3. Wait 30 minutes
  4. Run repository cleanup, using the commit-map files

First repo, I followed steps 1-6 and it worked as expected, the size was dramatically decreased and all branches and tags were there, pointing to the new hashes. Success!

Second repository, the size was decreased, all branches were created, all fine up to step 4. But then when I execute step 5, all the large files are attached to the tags again, and the repo is large again. When browsing the tags from the GitLab UI, I can see the large files. After executing step 6, the files are no longer visible, but the repo size is still large.

Anybody have an idea what could have gone wrong in the second case? I understand I could use steps 6-8 to remove the files, but why are they even added in the 2nd repo (but not for the 1st)??


Solution

  • It seems there were several tags pointing to unreachable objects. The solution is to modify step 1. Cloning will grab all branches and tags, including the ones pointing to unreachable objects. Instead, do a git init, then add remote, and fetch --all and fetch --tags, as outlined in this SO question:
    Why is a cloned repo 10x larger than a fetched repo?

    After that, execute steps 1-5 of the original question.