A colleague of mine attempted to permanently remove a file (Diff.java
) from the history of our GitHub repo.
He had good reasons for wanting to do this, however something seems to have gone wrong as we seem to have lost quite a few files which have been replaced by equivalent files with the suffix .REMOVED.git-id
. For example ivy-2.2.0.jar
-> ivy-2.2.0.jar.REMOVED.git-id
.
I have managed to repair the main development branch as I happened to have a copy locally. However there are many historical branches for development lines and tags for releases that now seem to be broken in the way described above.
I understand that he ran a process similar to:
$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg-1.12.3.jar --strip-biggest-blobs 500 some-big-repo
$ cd some-big-repo
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push
$ cd ..
$ java -jar bfg-1.12.3.jar --delete-files Diff.java some-big-repo
$ cd some-big-repo
$ git push
I am guessing that the process was destructive, and there is no way to recover unless we happen to have a clean mirror somewhere before this happened. Can anyone confirm or offer some advice?
This was the step that deleted all those old jars:
$ java -jar bfg-1.12.3.jar --strip-biggest-blobs 500 some-big-repo
...as the author of the BFG, I'm distressed to realise --strip-biggest-blobs 500
wasn't as clear as I thought. The command removes the largest 500 files (ie big files, or binary-large-objects: 'blobs') from the repositories history. I would be very interested to know what the user thought that step would do!
This is the command that correctly got rid of Diff.java
:
$ java -jar bfg-1.12.3.jar --delete-files Diff.java some-big-repo
The instructions for the BFG say "you should make a backup" of your repository before running the BFG, but it sounds like that didn't happen here.
You may still have a chance to recover your old branches and tags, given two things:
git gc
on their repos immediately - the objects may well still be around, and may even be referenced by old pull requests, if you use them. I would take an immediate mirror clone of your GitHub repo.object-id-map.old-new.txt
file under the some-big-repo.bfg-report
directory every time it runs, containing the old ids, and the new ids, for every commit it altered. There will be more than one of these files, because the BFG was run more than once. Using these files, and examining your current refs, you should be able to back-track through the two BFG runs to find out what the original commit ids of your refs were.Your recovery process, given those things, is something like this:
--mirror
clone of your repository most likely to still contain your old objects.master
was 686b0cd80ac328e060b80dda3c9dadb1e400134a
, do git cat-file -p 686b0cd80ac328e060b80dda3c9dadb1e400134a
. You will see a summary of the commit if the object is still around. if it's not, add remotes for your other candidate repos, and try pulling in the data from theremaster
branch to the value of the original commit with git update-ref: git update-ref refs/heads/master 686b0cd80ac328e060b80dda3c9dadb1e400134a
Repeat for all the other branches and tags that you care about - hopefully you can script this, good luck!