gitgit-show

get the full filepath before and after file renaming in git log


Suppose I know that a file has been renamed (rename can happen by renaming the filename or moving the file into another directory) into a specific commit. git show --summary <sha> shows all instances of such renaming in the commit. However, git only marks out the difference between the old and new filepath. Below are two examples -

rename xbmc/interfaces/{ => builtins}/Builtins.cpp (100%)
rename xbmc/cores/AudioEngine/Engines/ActiveAE/{ActiveAEResample.cpp => ActiveAEResampleFFMPEG.cpp} (100%)

How to efficiently parse the full filepath before and after the renaming here as it can be done in many ways? Or is there another git command that shows this information in a simpler way?


Solution

  • TL;DR

    Given that you know the commit hash <hash>, you probably want:

    git diff-tree --find-renames -r --name-status --diff-filter=R --no-commit-id <hash>
    

    or the same with -z added. You may want to specify the (first) parent of the merge as well, in which case the --no-commit-id is unnecessary.

    Long

    There are several ways to do this depending on various details about what you want for output. The key is to start with a predictable plumbing command. In Git, a plumbing command is one that is essentially designed to be used by some other program, so that it has a machine-readable, predictable, reliable output format. What you're getting now is the output of git diff --summary, and git diff is a porcelain command, designed to have human-readable output:

    $ git diff --summary 99177b34db^ 99177b34db
     rename contrib/hooks/multimail/{README => README.rst} (95%)
    

    which git show --summary runs at the end of its other operations.

    For mechanically-parseable output, we can switch to git diff-tree. If we want the names and status-es of each modified file we can ask for that:

    $ git diff-tree --name-status -r 99177b34db^ 99177b34db
    M       contrib/hooks/multimail/CHANGES
    M       contrib/hooks/multimail/CONTRIBUTING.rst
    D       contrib/hooks/multimail/README
    M       contrib/hooks/multimail/README.Git
    A       contrib/hooks/multimail/README.rst
    M       contrib/hooks/multimail/doc/gitolite.rst
    M       contrib/hooks/multimail/git_multimail.py
    M       contrib/hooks/multimail/migrate-mailhook-config
    M       contrib/hooks/multimail/post-receive.example
    

    We can immediately see that there is a flaw here: we did not observe a rename. That's because between the (first and only) parent of commit 99177b34db (99177b34db^) and commit 99177b34db itself, there was no actual rename. The two snapshots just have two sets of files. The rename we see is a guess that git diff --summary makes. To instruct Git to make the same guess when using git diff-tree we must add --find-renames—which lets us choose the similarity threshold that counts as a rename, but defaults to the same 50% that we get for the summary:

    $ git diff-tree --find-renames --name-status -r 99177b34db^ 99177b34db
    M       contrib/hooks/multimail/CHANGES
    M       contrib/hooks/multimail/CONTRIBUTING.rst
    M       contrib/hooks/multimail/README.Git
    R095    contrib/hooks/multimail/README  contrib/hooks/multimail/README.rst
    M       contrib/hooks/multimail/doc/gitolite.rst
    M       contrib/hooks/multimail/git_multimail.py
    M       contrib/hooks/multimail/migrate-mailhook-config
    M       contrib/hooks/multimail/post-receive.example
    

    That R095 line contains what we want: the detected rename, the similarity value (in this case 95%), and both file names, in this case separated by tabs.

    We can use --diff-filter to shrink the output to include only the renames:

    $ git diff-tree --find-renames --name-status -r --diff-filter=R 99177b34db^ 99177b34db
    R095    contrib/hooks/multimail/README  contrib/hooks/multimail/README.rst
    

    Note that we can run git diff-tree with just one commit hash. This works well when the commit is an ordinary (non-merge) commit:

    $ git diff-tree --find-renames --name-status -r --diff-filter=R 99177b34db
    99177b34db1d473e8f90544cf0bf83f47308e9ad
    R095    contrib/hooks/multimail/README  contrib/hooks/multimail/README.rst
    

    However, now we get the full hash ID in the output. Adding --no-commit-id tells it not to include the hash ID.

    It also works differently if the commit we specify is a merge commit. I'm not going to illustrate that here, as I don't have a handy merge to look at this way, but pay close attention to the documentation's description of the diff format for merges and the separate note about the combined format that tells us that sometimes we don't see some files at all.

    Dropping --name-status gets us this other format, which is longer and sometimes more useful:

    $ git diff-tree --find-renames -r --diff-filter=R 99177b34db^ 99177b34db
    :100644 100644 5105373aea044f2d8fde0c4fd927c8c492d02585 7c0fc4a6ef00362dcff476497a6045a420562d05 R095   contrib/hooks/multimail/README  contrib/hooks/multimail/README.rst
    

    Here we get the blob hashes of the two files, with the two modes (100644) in front of them, all prefixed by a single colon :. The details would change if we got the output for a merge commit.

    In all of these cases, you can add the -z option. This changes the output to be even more machine-readable (but very human-un-readable): the various parts of each output record have ASCII NUL (0x00) bytes to separate them. This option is also described in the documentation, along with some details of what modifications are done to pathnames when you don't use the -z.