gitgit-cherry-pickgit-patchgit-am

What is the difference between git cherry-pick and git format-patch | git am?


I sometimes need to cherry-pick a tag with a certain fix into my branch, and used to do so via

git cherry-pick tags/myfix

This works, but cherry-picking takes an increasingly long time doing "inexact rename detection".

My hunch was that this could be faster with

git format-patch -k -1 --stdout tags/myfix | git am -3 -k

In fact, this turned out to apply the fix instantly, leaving my branch in exactly the same state as cherry-picking.

Now my question is, what exactly does cherry-picking do differently? I thought cherry-picking was basically implemented as exactly this, but I must have been mistaken.


Solution

  • cherry-pick is implemented as a merge, with the merge base being the parent of the commit you're bringing in. In cases where there are no merge conflicts, this should have exactly the same effect as generating and applying the patch as you have (but see torek's answer for a bit of a caveat, where am could, in theory, do the wrong thing).

    But by doing a merge, cherry-pick can try to more gracefully handle cases where changes would conflict. (In fact, the -3 option you gave to am tells it that, if need be, it should do the same thing if it has enough context in the patch to be able to do so. I'll come back to that point at the end...)

    When you apply a patch, by default if it changes a hunk of code that is not the same in the commit where you apply it, as it was in the parent commit from which it was generated, then the apply will fail. But the cherry-pick/merge approach will look at what those differences are, and generate a merge conflict from them - so you have the chance to resolve the conflict and carry on.

    As part of conflict detection, cherry-pick does rename detection. So for example, say you have

    o -- x -- x -- A <--(master)
          \
           B -- C -- D <--(feature)
    

    and you cherry-pick commit C onto master. Suppose at o you created file.txt, and in A you have modifications to file.txt. But commit B moves file.txt to my-old-file.txt, and commit C modifies my-old-file.txt.

    The change to my-old-file.txt in C could conflict with the change to file.txt in A; but to see that possibility, git has to do rename detection so it can figure out that file.txt and my-old-file.txt are "the same thing".

    You may know that you don't have that situation, but git doesn't know until it tries to detect renames. I'm not sure why that would be time-consuming in this instance; in my experience it usually isn't, but in a repo with lots of paths added and deleted (between B and either C or A in our example) it could be.

    When you generate and apply a patch instead, it tries to apply the patch on the assumption that there is no conflict. Only if this runs into a problem (and then, only because you gave the -3 option) will it fall back to doing a merge, with conflict detection. It gets to skip all that - and any potential rename detection - as long as its first attempt applies cleanly.


    Update - As noted in comments on the question, you also can turn rename detection off if it's not helping and is running slowly. If you use this when there are, in fact, renames that "matter" to the merge, it may cause conflicts where rename detection would resolve them. Although I don't think it should, I can't rule out that it might also just calculate an incorrect merge result and quietly apply it - which is why I rarely use this option.

    For the default merge strategy, the -X no-renames option will turn off rename detection. You can pass this option to cherry-pick.

    Per torek's comment, it seems rename detection should be a non-issue with am. That said, I can confirm that it is able to properly handle a case where merge only works with rename detection. I'm going to return to trying to understand the ins and outs of this sometime when it's not Friday afternoon.