gitgit-diffgit-patch

Implications of removing the index header line from patch


Context

I have a git repo ("parent repo") that contains a .patch file I have generated using git diff --binary sometag in another repository ("inner repo"). The parent repo is public and open for pull requests. According to the docs this file contains a header line for each change that looks like this:

index <hash>..<hash> <mode>

This means every time two collaborators touch the same file (even in completely different lines) the resulting hash will be different and thus the index header will change causing a merge conflict that could be avoided.

I tried to remove these index headers and git apply still seems to work just fine.

Question

Are there any (hidden) implications that come with the removal of the index header lines? Can it cause issues with binaries?

I assume the git apply tool will just look at the hashes for optimization reasons. I.e: "don't bother with a diff when you already have the resulting file".

I know the mode header is often embedded within the index header, it's ok for this information to be missing for my purposes.


Solution

  • Are there any (hidden) implications that come with the removal of the index header lines?

    They're not really hidden, they just don't reach out and whack you in the face (unlike, say, the way Git's index interacts with .gitignore).

    I assume the git apply tool will just look at the hashes for optimization reasons.

    It's actually rather the reverse. The apply command is given a diff, and tries to apply that diff to the existing file(s) in your working tree and/or index. Those diffs have context as well as changes and it's possible that the context and/or changes don't match up with the existing file(s) to which the diff is to be applied. If we're supposed to add:

    (the BEST)
    

    between line 12 and what was line 13, which should read

    one ice cream flavor
    is chocolate
    

    but now they read:

    the odors of skunks
    and ferrets
    

    well, perhaps that's not the BEST anymore, is it?

    Anyway, if things look fishy, Git is capable of using the index line to poke around in your own local repository, to try to figure out what the file looked like before the diff got added. If it can do so—if it can find the original file—it can then apply the patch to the original file. It now has three files:

    With these three versions of a single file, Git can now do a full-blown three-way merge, just as git merge would do, on that one file. Git can figure out if the reason the patch didn't match up was because the lines moved—so now the patch can be applied to the correct lines, which might be line 52 and 53 instead of 12 and 13, for instance—or if it is because you made conflicting changes, and now there's a merge conflict. (Perhaps you correctly identified these thiol compounds as worst-smelling things, although cadaverine and putrescine will give them a run for the money.)

    ... using git diff --binary sometag ...

    A binary patch, from Git, requires that you have the correct preimage (the item given by the index line). Without the index line, assuming I am reading the source correctly, Git will refuse to apply the patch, making the patch itself useless. If you're going to delete the index line, you might as well delete the binary patch.