I have a large (~2GB), old (15+ years) git repo (that's been converted from CVS to SVN to git over those years). We are changing our hosting from on-prem to cloud, so I want to take this time to clean up the repo. I want to delete old files that are no longer part of the history in order to reduce the overall clone size/time.
There are hundreds of branches that I don't care about anymore. I am really only interested in preserving a few (~10) branches.
I've tried using BFG repo cleaner using --strip-blobs-bigger-than 1M --protect-blobs-from <my refs>
. It seems to match my use case very well. I don't want to remove any files that are currently present in the HEAD of my selected branches, regardless of their size. However, it doesn't deal with the changed commit hashes very nicely, other than producing a mapping file.
I've also tried git filter-repo using --strip-blobs-bigger-than 1M
. This uses replace-refs so that I can reference by the old commit hash, which is really important. However, it breaks things in my current branches by deleting files I don't want to remove.
It seems like git filter-repo
is the tool I should be using, however, I don't want to manually list all of the files I want to delete (or conversely the files I want to keep). Is there a better way to do this?
I ended up using bfg-ish, a re-implementation of BFG Repo cleaner, based on git-filter-repo.
It provided the API I wanted from BFG Repo cleaner, with the functionality of git-filter-repo
All I had to run was
$ git clone <my repo url> myrepo
$ git lfs fetch --all
$ cd myrepo
$ bfg-ish -p branch1 branch2 branch3 branch4 --strip-blobs-bigger-than 1M .
where branchN are the names of branches I actually wanted to preserve.
This repo also used git-lfs, so it was important to pull all of the LFS objects before messing with all of the references. Then I pushed the resulting repo to my new location.