gitreachabilityreflog

git reflog expire and git fsck --unreachable


Disclaimer: this question is purely informational and does not represent an actual problem I'm experiencing. I'm just trying to figure out stuff for the sake of it (because I love figuring stuff out, and I know you do too).

So I was playing with git, trying to expire an amended commit. My reflog looks like that:

4eea1cd HEAD@{0}: commit (amend): amend commit
ff576c1 HEAD@{1}: commit: test: bar
5a1e68a HEAD@{2}: commit: test: foo
da8534a HEAD@{3}: commit (initial): initial commit

Which means I made two commits (da8534a and 5a1e68a), then a third commit ff576c1 that I amended with 4eea1cd.

Just as expected, my git log looks something like that:

* 4eea1cd (HEAD, master) amend commit
* 5a1e68a test: foo
* da8534a initial commit

From what I (though I) know about expirability of commits, some day (most likely, in 30 days by default) git gc should collect ff576c1. Now I don't want to wait for 30 days to see that happen, so I start running a few commands, first:

git fsck --unreachable --no-reflogs

Which, just as expected again, gives me:

unreachable blob 5716ca5987cbf97d6bb54920bea6adde242d87e6
unreachable tree 1e60e555e3500075d00085e4c1720030e077b6c8
unreachable commit ff576c1b4b6df57ba1c20afabd718c93dacf2fc6

All confident that I'm going to expire that poor lonely ff576c1 commit, I then run git reflog expire:

git reflog expire --dry-run --expire-unreachable=now --all

Which, that time, gives me:

would prune commit: test: bar
would prune commit (amend): amend commit

At first I though my HEAD was not referencing master, but as you can see in the git log output I gave earlier, it actually does. Also, cat .git/HEAD confirms that (yelding ref: refs/heads/master). Anyway, even that though was silly, since 4eea1cd is the head of my master branch.

So here I am, all confused that these two commands won't give me the same commits, and wondering how the hell could 4eea1cd possibly be unreachable, since it's the actual tip of my master branch.

Any idea on what's going on?

EDIT: I just noticed if I add the --rewrite option to git reflog expire, like that:

git reflog expire --dry-run --expire-unreachable=now --all --rewrite

Then I only get the amended commit:

would prune commit: test: bar

I still don't understand, because according to git help reflog:

   --rewrite
       While expiring or deleting, adjust each reflog entry to ensure that
       the old sha1 field points to the new sha1 field of the previous
       entry.

Which doesn't make sense in my case. Well at least I don't get it, since obvisouly it does change something.


Solution

  • This behavior comes from an interaction between the reflog design philosophy and the requirements of garbage collection.

    For a commit to be safely deleted by the garbage collector, all references to that commit must be deleted—including references in reflog entries. Despite the appearance of reflog show, each reflog entry actually contains two SHA1 identifiers: the value of the ref before the change and the value of the ref after the change. To ensure safe garbage collection, reflog expire simply deletes any entry where one of the two SHA1s identifies an unreachable commit.

    In your case, the pre-change value of the most recent reflog entry refers to an unreachable commit. Even though the commit identified by the post-change value is still reachable, reflog expire deletes the entry.

    This design is simple to implement and results in an incomplete but accurate log.

    the --rewrite option

    Unfortunately, deleting an entry that refers to a still-reachable commit has a couple of problems:

    The --rewrite option addresses these problems by changing the behavior in the following way:

    Unfortunately, modifying the entry results in a log that no longer accurately reflects the history of the ref. For example, the change reason may no longer make sense after the rewrite. This is why --rewrite is not the default.