I've been advised to use the BFG Repo-Cleaner as my local repo that I want to push contains files too large to push to GitHub. These files (above about 50MB) I don't mind if they get deleted and I accidentally committed them a while back.
On the online instructions: https://rtyley.github.io/bfg-repo-cleaner/
It suggests I should clone a fresh copy of my repo using the --mirror flag (this is seemingly an online version, not the local version). Then to do the Java -jar bfg.jar ... command. And following this to cd back into that local mirror copy of the online repo, and then to push the information back.
I don't quite understand how this applies for local copies. For local copies that are too big to push should I e.g. do:
git clone --mirror /Users/me/myrepo
java -jar bfg.jar --strip-blobs-bigger-than 100M /Users/me/myrepomirror.git
Then I don't also understand how the next steps:
cd /Users/me/myrepomirror.git git reflog expire --expire=now --all && git gc --prune=now --aggressive git push
would address anything to do with my non-mirrored local repo:
/Users/me/myrepo
I am not sure if they imply that I should then do after this:
java -jar bfg.jar --strip-blobs-bigger-than 50M my-repo.git
And again I do not know how this addresses the actual repo (not a mirror or an online version) that I want to prune so that I can push it.
Perhaps I am being a bit dull? The documentation doesn't seem very explicit/extensive for something so potentially useful. Any help here would be great. Thanks!
I've never used BFG before. It sounds useful if you're in this situation of having large files that you need to remove. However, I'll try to explain the overall process, as I understand it.
Before we begin, note that BFG will rewrite the history of the the remote repository, and pushing it will require everyone on your team to re-clone the repository and transfer their local-only branches over.
According to git's documentation, git clone --mirror
Set up a mirror of the source repository. This implies --bare. Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository.
This means that the clone will create an exact copy of the remote repository on your machine. As the BFG docs say, you should create a backup of this clone in case you need it later.
java -jar bfg.jar --strip-blobs-bigger-than 100M some-big-repo.git
Will target the clone you made with git clone --mirror
and will clean all commits of files containing > 100M except the most recent commit (as mentioned in the BFG docs). BFG won't delete the old data automatically. It will stop, let you confirm everything looks good and then leave you to clean up the rest.
cd /Users/me/myrepomirror.git
Will navigate to the bare repository. You may have to change the path accordingly.
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Let's break this command up into it's two logical parts:
git reflog expire --expire=now --all
--expire=now
tells git to expire all reflogs prior to the current time.--all
means across all references. Without --all, the expiration would only happen for the branch you're currently on, rather than all branches.git gc --prune=now --aggressive
--prune=now
tells git gc to remove loose objects prior to the current time.--aggressive
will cause git gc to spend more time cleaning the repository of unnecessary files and provide greater optimization. The git gc
docs have some additional info on it.Once all of that is done, git push
will overwrite the remote version of all of the branches with the newly cleaned ones.
You would now have to re-clone the repository in a different directory with git clone
to obtain a non-bare version.
Essentially what we've done with this process is create a copy of the remote repository, removed the offending files and rewritten the commit history in the process, pushed the rewritten remote and overwritten what was there previously, and cloned a new copy of that repository for us to continue working.
I'd suggest some preventative measures to avoid having to constantly remove these files. BFG
shouldn't be run frequently, since it rewrites the repository's history.
Unfortunately, .gitignore doesn't support ignoring files larger than a given size. However, there may be some options available to you, regardless.