gitmergerebasegit-interactive-rebase

Low Level Difference between git rebase, git rebase -i and git merge


During a rebase, where I synced my local feature branch to the upstream branch to finalize a pull request, I tried using all three methods (git rebase, git rebase -i and git merge) and each of them offered a completely different experience, when it came to conflict resolving.

Git merge showed me all my conflicts at once. I resolved them and added the changes once done solving all of them. As expected, merging messed up my history and I had to revert again.

Git Rebase led me through the conflicts in two steps. In each I added my changes and continued the rebase thereafter. In between I lost one of my patches and had to start over again.

Interactive Rebasing worked like a charm. It led me through the conflicts commit by commit, and after each resolution, it started fast forwarding again from the base of the feature branch to the next conflict. I could ensure that commit co-authors were included correctly and at the end did not even need to add a 'merge' or 'rebase' commit, sitting at the head of the branch after finishing.

I have a conceptual understanding of when to use each of them, but why exactly did the rebase and interactive rebase behave so wildly different, even without interactively editing the revision? Why are git merge and git rebase even used, when they seem to do things badly and make it easier to mess up something in the history?


Solution

  • ... why exactly did the rebase and interactive rebase behave so wildly different

    As a general rule, they shouldn't. They sometimes do, and explaining precisely why is tricky. A quick bottom line take-away is that the non-interactive git rebase uses—well, sometimes uses—git format-patch and pipes its output to git am, and this can, though usually doesn't, do the same thing as the interactive rebase, which uses git cherry-pick instead.

    Historically, this was the only form of git rebase, and since it does behave a bit differently—and could work better—the Git authors chose not to switch everyone to an "always cherry pick" approach.

    The long and involved answer

    Why are git merge and git rebase even used, when they seem to do things badly and make it easier to mess up something in the history?

    First, git merge and git rebase have different goals, so they're not all that comparable. You're already aware that Git is all about commits, with branch names merely a way to find a commit—one specific commit, from which Git finds all the previous commits—but let's do a bit of terminology here to help us talk about it:

    ...--o--*--o--L   <-- master (HEAD)
             \
              o--o--R   <-- develop
    

    Note that we can re-draw this as:

              o--L   <-- master (HEAD)
             /
    ...--o--*
             \
              o--o--R   <-- develop
    

    to emphasize that, from commit * on backwards, all these commits are on both branches simultaneously. The name master, which is also the current branch HEAD, identifies commit L (for "left" or "local"). The name develop identifies commit R ("right" or "remote"). It's those two commits that identify their parent commits, and if we—or Git—carefully follow each parent backwards, the two streams of commits eventually rejoin—permanently, in this case—at commit *.

    Notes on git merge, which we need to talk about rebase

    Running git merge asks Git to find the merge base, i.e., commit *, and then compare that merge base to each of the two branch tip commits L (local or --ours) and R (remote or --theirs). Whatever is different on the left/local side, we must have changed. Whatever is different on the right/remote side, they must have changed. The merge machinery, performing the act of merging ("merge" as a verb), combines these two sets of changes.

    The git merge command (assuming it does a real merge like this, i.e., that you're not doing fast-forward or squash) uses the merge machinery in this way to compute the set of files that should be committed, then makes a new merge commit. This kind of commit—which uses the word "merge" as an adjective, or is shortened to just "a merge", using "merge" as a noun—has two parents: L is the first parent, and R is the second. The files are determined by the merge-as-a-verb action; the commit itself is a merge. If we draw this as:

    ...--o--o--o--L---M   <-- master (HEAD)
             \       /
              o--o--R   <-- develop
    

    we can then add more commits later, at which point we can run git merge again, choosing a new L and R:

    ...--o--o--o--o---M--L   <-- master (HEAD)
             \       /
              o--o--o--o--R   <-- develop
    

    The merge base this time is not the commit that used to be *, but rather the commit that used to be R! So the presence of merge commit M alters the next merge base for the next git merge command.

    Basics of any rebase

    What git rebase does is very different: it identifies some set of commits to copy, and then copies them.

    The set of commits to copy is the commits that are reachable from the current branch (i.e., HEAD), that are not reachable from the <upstream> argument you supply:

    $ git checkout develop
    $ git rebase <upstream-hash>   # or, easier, git rebase master
    

    At this point, internally, Git generates a list of commit hashes. If the commit graph still looks like this:

    ...--o--*--F--G   <-- master
             \
              C--D--E   <-- develop (HEAD)
    

    and the argument to git rebase identifies commit * or any commit after that on master—including, of course, G, the tip of master, which is usually what we would choose here—then the set of commit hashes to be copied are those for C--D--E.

    Some commits in this set may be tossed out, on purpose. This includes:

    The latter means that Git computes the git patch-id for commits F and G. If those match the git patch-id of commits C, D, or E, those commits are tossed from the "to copy" list.

    (If --fork-point mode is used, Git may toss additional commits from the list. Describing this well is difficult. See Git rebase - commit select in fork-point mode.)

    Git now begins the copying process. This is where non-interactive and interactive rebase can differ. Both start by "detaching HEAD", setting it to the target of the copying. This defaults to the <upstream> commit, in our case, commit G.

    The normal non-interactive method

    Normally, a non-interactive git rebase runs git format-patch on the selected commits, then feeds the output to git am:

    git format-patch -k --stdout --full-index --cherry-pick --right-only \
            --src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
            $git_format_patch_opt \
            "$revisions" ${restrict_revision+^$restrict_revision} \
            >"$GIT_DIR/rebased-patches"
    ...
    git am $git_am_opt --rebasing --resolvemsg="$resolvemsg" \
            $allow_rerere_autoupdate \
            ${gpg_sign_opt:+"$gpg_sign_opt"} <"$GIT_DIR/rebased-patches"
    

    This git am repeatedly invokes git apply -3. Each git apply tries to apply the diff directly: find the context, verify that the context is unchanged, and then add and delete the lines shown in the git diff output embedded in the git format-patch stream.

    If the verification step fails, git apply -3 (the -3 is important) uses a fallback method: the index lines in the format-patch output identify the merge base version of each file, so git apply can extract that merge base version, apply the patch directly to it—this should always work—and use that as a "version R". The merge base version is, of course, the merge base version, and the current or HEAD version of the file is acts as "version L". We now have everything we need to do a regular git merge of that one particular file. We only merge one file at this point, and this is just "merge as a verb". (See also the description below of git cherry-pick.)

    This three-way merge can succeed or fail as always. Whichever happens, Git can move on to the rest of the files in this particular patch. If all patches apply—either directly, or as a result of the three-way merge fallback—Git will make a commit from the result, using the message text saved in the git format-patch stream. This copies the original commit to a new, but at least slightly different, commit, whose parent is the commit that was HEAD:

                    C'   <-- HEAD
                   /
    ...--o--*--F--G   <-- master
             \
              C--D--E   <-- develop
    

    This process repeats for commits D and E, giving:

                    C'-D'-E'   <-- HEAD
                   /
    ...--o--*--F--G   <-- master
             \
              C--D--E   <-- develop
    

    When it's complete, git rebase "peels the label" develop off the old commit chain and sticks it on the new one. Ideally, the old commits are abandoned, find-able only through the reflogs and, temporarily, the special name ORIG_HEAD:

                    C'-D'-E'   <-- develop (HEAD)
                   /
    ...--o--*--F--G   <-- master
             \
              C--D--E   [abandoned]
    

    though if there are other ways to find the old commits (existing tag or branch names that lead to them), the old commits aren't abandoned after all, and you will see both old and new.

    Interactive rebase

    The obvious difference between old-style git-rebase--am.sh and interactive git-rebase--interactive.sh is that the latter writes a big instructions file including help text, and lets you edit it. But even if you just write it out as is, the actual code to implement each pick command runs git cherry-pick. (This code has been revised in the most recent versions of Git and is now implemented in C, rather than shell script, but the shell script is much clearer, and the two are supposed to behave the same, so I have linked to the script here.)

    When git cherry-pick runs, it always does a three-way merge (at least in any even semi-modern Git: there may have been an old one that used git format-patch | git am -3, at some point; I have a fuzzy memory of different behavior in early days). What's unusual about this three-way merge is that the merge base is the parent of the commit being cherry-picked. This means that if we are about to copy commit D, as in this state:

                    C'   <-- HEAD
                   /
    ...--o--*--F--G   <-- master
             \
              C--D--E   <-- develop
    

    the merge base for this particular merge-as-a-verb operation is not commit *. It's not even a commit that's on master at all: it's commit C.

    The merge base when we were copying C to C' was *, since * is C's parent. That one makes sense. This one doesn't, at least at first. How can C be the merge base? But it is: Git runs git diff --find-renames C C' in order to see "what we changed", and combines that with git diff --find-renames C D ("what they changed").

    If any of those changes overlap, we'll get a merge conflict. If not, Git will keep "what we changed" and simply add to it "what they changed". Note that these two comparisons, these two git diff --find-rename operations, run commit-wide, not just on one specific file. This allows the cherry-pick to find files that were renamed in one of the two branches. Git then does the merge-as-a-verb on every file. When it is done, if there is no conflict, Git makes an ordinary (non-merge) commit from the resulting files.

    Assuming all goes well, and D gets copied to D', Git goes on to cherry-pick E. This time D is the merge base. The action works just as before: we find renames, merge-as-a-verb all the files, and make an ordinary, non-merge commit that is E'.

    Finally, as with non-interactive rebase, Git peels the branch name off the old tip commit and places it on the new tip.

    More peculiarities of non-interactive vs interactive

    There are a number of side consequences of non-interactive rebase using git format-patch. The most significant is that git format-patch literally cannot produce an "empty" patch—a commit that makes no changes to the source—so if you use -k to "keep" such commits, the non-interactive rebase uses git cherry-pick.

    The second is that because git format-patch is told --no-renames (see the actual command above), it represents a file rename as "delete old file, add new file". This prevents Git from spotting some conflicts. (As long as the to-be-deleted file is in the patch, it can at least detect a delete/modify conflict, but it can't detect a delete/rename conflict, and in patches "beyond" the rename, it will have nothing at all to notice.) And, of course, if we can construct a case in which a patch applies because of apparently-valid context, even though a three-way merge might find that the matching context is from a moved copy of the code, we can successfully apply a patch where a three-way merge would either detect a conflict, or apply it elsewhere.

    (I intend to construct an example at some point but have never had time to do it.)

    If you use the -m option, specifying that rebase should use the merge machinery, or a -s <strategy> option or -X <extended-option> (both of which imply using the merge machinery), this also forces Git to use cherry-pick. However, that's actually a third kind of rebase!

    The rebase type-selection happens in git-rebase.sh, well into the script:

    if test -n "$interactive_rebase"
    then
            type=interactive
            state_dir="$merge_dir"
    elif test -n "$do_merge"
    then
            type=merge
            state_dir="$merge_dir"
    else
            type=am
            state_dir="$apply_dir"
    fi
    

    Note that the location of hidden state files, keeping track of whether you're in the middle of an ongoing git rebase that has stopped to let you edit (interactive rebase) or due to a conflict (any rebase), varies depending on the type of rebase.

    Git notes

    The last point of difference is that the am based rebase does not run git notes copy. The other two do. This means that notes you made on the original commits are dropped when using git rebase, but kept when using interactive rebase or git rebase -m.

    (This seems like a bug to me, but perhaps it is deliberate. Preserving the notes would be a little tricky since we need a mapping from old commit hash to new commit hash. This would need support inside git am.)