We are migrating from Azure DevOps Git to GitHub. The repo is huge, old, unfortunately has binaries and with tons of branches and tags. We decided on a cut-off date and want to drop all history before that date (which will also remove the binaries and large files as they were later deleted) We want to retain only specific branches from the selected date and hopefully keep the tags.
Got completely lost with filter-branch and haven't been able to find a good and fast way of doing this. This simplest thing I found was doing an orphan checkout from what we want as the new root commit, rebasing and then prune and run garbage collector. But, the new root commit is dated to now, all commit IDs change, we lose all the tags and I couldn't do it for all branches I want to retain.
What is the best way of achieving this?
The trick is to use grafts to fake new root commits, then burn them into the history using git filter-branch
or git filter-repo
.
Let's say, you determined that commit 1234abcd
is the new root commit and it is the only one needed. Then
git replace --graft 1234abcd
installs a replacement commit that pretends that 1234abcd
has no parents. Now run
git filter-branch --tag-name-filter=cat master branch1 branch2 tag1 tag2 ...
(or an equivalent git filter-repo
command). This rewrites commit 1234abcd
to really have no parent (and results in a different commit name, of course) and rewrites the history up to the specified refs.
You should be able to repeat the command with different branch and tag names, should you forget some or if you want to do the job incrementally. Make sure to specify only refs whose history does not bypass the root commit (this could happen accidentally if there are merges from history before the un-rewritten commit into history after the new root commit).