During a rebase, where I synced my local feature branch to the upstream branch to finalize a pull request, I tried using all three methods (git rebase, git rebase -i and git merge) and each of them offered a completely different experience, when it came to conflict resolving.
Git merge showed me all my conflicts at once. I resolved them and added the changes once done solving all of them. As expected, merging messed up my history and I had to revert again.
Git Rebase led me through the conflicts in two steps. In each I added my changes and continued the rebase thereafter. In between I lost one of my patches and had to start over again.
Interactive Rebasing worked like a charm. It led me through the conflicts commit by commit, and after each resolution, it started fast forwarding again from the base of the feature branch to the next conflict. I could ensure that commit co-authors were included correctly and at the end did not even need to add a 'merge' or 'rebase' commit, sitting at the head of the branch after finishing.
I have a conceptual understanding of when to use each of them, but why exactly did the rebase and interactive rebase behave so wildly different, even without interactively editing the revision? Why are git merge and git rebase even used, when they seem to do things badly and make it easier to mess up something in the history?
... why exactly did the rebase and interactive rebase behave so wildly different
As a general rule, they shouldn't. They sometimes do, and explaining precisely why is tricky. A quick bottom line take-away is that the non-interactive git rebase
uses—well, sometimes uses—git format-patch
and pipes its output to git am
, and this can, though usually doesn't, do the same thing as the interactive rebase, which uses git cherry-pick
instead.
Historically, this was the only form of git rebase
, and since it does behave a bit differently—and could work better—the Git authors chose not to switch everyone to an "always cherry pick" approach.
Why are git merge and git rebase even used, when they seem to do things badly and make it easier to mess up something in the history?
First, git merge
and git rebase
have different goals, so they're not all that comparable. You're already aware that Git is all about commits, with branch names merely a way to find a commit—one specific commit, from which Git finds all the previous commits—but let's do a bit of terminology here to help us talk about it:
...--o--*--o--L <-- master (HEAD)
\
o--o--R <-- develop
Note that we can re-draw this as:
o--L <-- master (HEAD)
/
...--o--*
\
o--o--R <-- develop
to emphasize that, from commit *
on backwards, all these commits are on both branches simultaneously. The name master
, which is also the current branch HEAD
, identifies commit L
(for "left" or "local"). The name develop
identifies commit R
("right" or "remote"). It's those two commits that identify their parent commits, and if we—or Git—carefully follow each parent backwards, the two streams of commits eventually rejoin—permanently, in this case—at commit *
.
git merge
, which we need to talk about rebaseRunning git merge
asks Git to find the merge base, i.e., commit *
, and then compare that merge base to each of the two branch tip commits L
(local or --ours
) and R
(remote or --theirs
). Whatever is different on the left/local side, we must have changed. Whatever is different on the right/remote side, they must have changed. The merge machinery, performing the act of merging ("merge" as a verb), combines these two sets of changes.
The git merge
command (assuming it does a real merge like this, i.e., that you're not doing fast-forward or squash) uses the merge machinery in this way to compute the set of files that should be committed, then makes a new merge commit. This kind of commit—which uses the word "merge" as an adjective, or is shortened to just "a merge", using "merge" as a noun—has two parents: L
is the first parent, and R
is the second. The files are determined by the merge-as-a-verb action; the commit itself is a merge. If we draw this as:
...--o--o--o--L---M <-- master (HEAD)
\ /
o--o--R <-- develop
we can then add more commits later, at which point we can run git merge
again, choosing a new L
and R
:
...--o--o--o--o---M--L <-- master (HEAD)
\ /
o--o--o--o--R <-- develop
The merge base this time is not the commit that used to be *
, but rather the commit that used to be R
! So the presence of merge commit M
alters the next merge base for the next git merge
command.
What git rebase
does is very different: it identifies some set of commits to copy, and then copies them.
The set of commits to copy is the commits that are reachable from the current branch (i.e., HEAD
), that are not reachable from the <upstream>
argument you supply:
$ git checkout develop
$ git rebase <upstream-hash> # or, easier, git rebase master
At this point, internally, Git generates a list of commit hashes. If the commit graph still looks like this:
...--o--*--F--G <-- master
\
C--D--E <-- develop (HEAD)
and the argument to git rebase
identifies commit *
or any commit after that on master
—including, of course, G
, the tip of master, which is usually what we would choose here—then the set of commit hashes to be copied are those for C--D--E
.
Some commits in this set may be tossed out, on purpose. This includes:
master
back into develop
);git patch-id
matches that of an upstream commit.The latter means that Git computes the git patch-id
for commits F
and G
. If those match the git patch-id
of commits C
, D
, or E
, those commits are tossed from the "to copy" list.
(If --fork-point
mode is used, Git may toss additional commits from the list. Describing this well is difficult. See Git rebase - commit select in fork-point mode.)
Git now begins the copying process. This is where non-interactive and interactive rebase can differ. Both start by "detaching HEAD", setting it to the target of the copying. This defaults to the <upstream>
commit, in our case, commit G
.
Normally, a non-interactive git rebase
runs git format-patch
on the selected commits, then feeds the output to git am
:
git format-patch -k --stdout --full-index --cherry-pick --right-only \
--src-prefix=a/ --dst-prefix=b/ --no-renames --no-cover-letter \
$git_format_patch_opt \
"$revisions" ${restrict_revision+^$restrict_revision} \
>"$GIT_DIR/rebased-patches"
...
git am $git_am_opt --rebasing --resolvemsg="$resolvemsg" \
$allow_rerere_autoupdate \
${gpg_sign_opt:+"$gpg_sign_opt"} <"$GIT_DIR/rebased-patches"
This git am
repeatedly invokes git apply -3
. Each git apply
tries to apply the diff directly: find the context, verify that the context is unchanged, and then add and delete the lines shown in the git diff
output embedded in the git format-patch
stream.
If the verification step fails, git apply -3
(the -3
is important) uses a fallback method: the index
lines in the format-patch output identify the merge base version of each file, so git apply
can extract that merge base version, apply the patch directly to it—this should always work—and use that as a "version R". The merge base version is, of course, the merge base version, and the current or HEAD
version of the file is acts as "version L". We now have everything we need to do a regular git merge
of that one particular file. We only merge one file at this point, and this is just "merge as a verb". (See also the description below of git cherry-pick
.)
This three-way merge can succeed or fail as always. Whichever happens, Git can move on to the rest of the files in this particular patch. If all patches apply—either directly, or as a result of the three-way merge fallback—Git will make a commit from the result, using the message text saved in the git format-patch
stream. This copies the original commit to a new, but at least slightly different, commit, whose parent is the commit that was HEAD
:
C' <-- HEAD
/
...--o--*--F--G <-- master
\
C--D--E <-- develop
This process repeats for commits D
and E
, giving:
C'-D'-E' <-- HEAD
/
...--o--*--F--G <-- master
\
C--D--E <-- develop
When it's complete, git rebase
"peels the label" develop
off the old commit chain and sticks it on the new one. Ideally, the old commits are abandoned, find-able only through the reflogs and, temporarily, the special name ORIG_HEAD
:
C'-D'-E' <-- develop (HEAD)
/
...--o--*--F--G <-- master
\
C--D--E [abandoned]
though if there are other ways to find the old commits (existing tag or branch names that lead to them), the old commits aren't abandoned after all, and you will see both old and new.
The obvious difference between old-style git-rebase--am.sh
and interactive git-rebase--interactive.sh
is that the latter writes a big instructions file including help text, and lets you edit it. But even if you just write it out as is, the actual code to implement each pick
command runs git cherry-pick
. (This code has been revised in the most recent versions of Git and is now implemented in C, rather than shell script, but the shell script is much clearer, and the two are supposed to behave the same, so I have linked to the script here.)
When git cherry-pick
runs, it always does a three-way merge (at least in any even semi-modern Git: there may have been an old one that used git format-patch | git am -3
, at some point; I have a fuzzy memory of different behavior in early days). What's unusual about this three-way merge is that the merge base is the parent of the commit being cherry-picked. This means that if we are about to copy commit D
, as in this state:
C' <-- HEAD
/
...--o--*--F--G <-- master
\
C--D--E <-- develop
the merge base for this particular merge-as-a-verb operation is not commit *
. It's not even a commit that's on master
at all: it's commit C
.
The merge base when we were copying C
to C'
was *
, since *
is C
's parent. That one makes sense. This one doesn't, at least at first. How can C
be the merge base? But it is: Git runs git diff --find-renames C C'
in order to see "what we changed", and combines that with git diff --find-renames C D
("what they changed").
If any of those changes overlap, we'll get a merge conflict. If not, Git will keep "what we changed" and simply add to it "what they changed". Note that these two comparisons, these two git diff --find-rename
operations, run commit-wide, not just on one specific file. This allows the cherry-pick to find files that were renamed in one of the two branches. Git then does the merge-as-a-verb on every file. When it is done, if there is no conflict, Git makes an ordinary (non-merge) commit from the resulting files.
Assuming all goes well, and D
gets copied to D'
, Git goes on to cherry-pick E
. This time D
is the merge base. The action works just as before: we find renames, merge-as-a-verb all the files, and make an ordinary, non-merge commit that is E'
.
Finally, as with non-interactive rebase, Git peels the branch name off the old tip commit and places it on the new tip.
There are a number of side consequences of non-interactive rebase using git format-patch
. The most significant is that git format-patch
literally cannot produce an "empty" patch—a commit that makes no changes to the source—so if you use -k
to "keep" such commits, the non-interactive rebase uses git cherry-pick
.
The second is that because git format-patch
is told --no-renames
(see the actual command above), it represents a file rename as "delete old file, add new file". This prevents Git from spotting some conflicts. (As long as the to-be-deleted file is in the patch, it can at least detect a delete/modify conflict, but it can't detect a delete/rename conflict, and in patches "beyond" the rename, it will have nothing at all to notice.) And, of course, if we can construct a case in which a patch applies because of apparently-valid context, even though a three-way merge might find that the matching context is from a moved copy of the code, we can successfully apply a patch where a three-way merge would either detect a conflict, or apply it elsewhere.
(I intend to construct an example at some point but have never had time to do it.)
If you use the -m
option, specifying that rebase should use the merge machinery, or a -s <strategy>
option or -X <extended-option>
(both of which imply using the merge machinery), this also forces Git to use cherry-pick. However, that's actually a third kind of rebase!
The rebase type-selection happens in git-rebase.sh
, well into the script:
if test -n "$interactive_rebase"
then
type=interactive
state_dir="$merge_dir"
elif test -n "$do_merge"
then
type=merge
state_dir="$merge_dir"
else
type=am
state_dir="$apply_dir"
fi
Note that the location of hidden state files, keeping track of whether you're in the middle of an ongoing git rebase
that has stopped to let you edit (interactive rebase) or due to a conflict (any rebase), varies depending on the type of rebase.
The last point of difference is that the am
based rebase does not run git notes copy
. The other two do. This means that notes you made on the original commits are dropped when using git rebase
, but kept when using interactive rebase or git rebase -m
.
(This seems like a bug to me, but perhaps it is deliberate. Preserving the notes would be a little tricky since we need a mapping from old commit hash to new commit hash. This would need support inside git am
.)