gitgit-mergegit-commitgit-revertteam-explorer

Why do I suddenly have a merge commit in my pushes?


Well, I seem to have gone and mucked something up.

Until just recently, I used to be able to do a merge commit and then push to origin without that separate commit showing up. Now that it does, the merge commit is all I can see at my pipeline:

Pipeline commits after

Before this started, only the manual commit was pushed to origin (or at least showed as such):

Pipeline commits before

Here's Team Explorer (VS 2019 v16.6.5), after the behavior changed:

Team Explorer

...and here's my local branch history:

Branch history

See the change?

This all started right after I reverted commit a13adadf, fixed it and republished it. Now I've got some sort of weird branching effect going on, and I don't know how to get things back to where they were before. (I tried researching the problem, but the signal-to-noise ratio is very low when searching on anything related to merge commit.)

How can I get my repo to 'ignore' (i.e. stop displaying) the merge commits?

(Note: I'm the only dev working on this repo.)


Solution

  • It seems likely that you were doing fast-forward operations before. The git merge command will do this instead of merging, provided conditions are correct:

    1. A fast-forward needs to be possible.
    2. You are required to avoid the --no-ff option, which would disable the fast-forward.

    This all started right after I reverted commit a13adadf, fixed it and republished it.

    This must have created a branch. There's a problem with this word—"branch", that is—that will lead you astray here, but the graph snippet you show in your question indicates that this is in fact what happened.

    How can I get my repo to 'ignore' (i.e. stop displaying) the merge commits?

    If you just want to avoid displaying them, there may be some option to your viewer to do this.

    If you want to go back to not making them—the situation you were in before—you need to eliminate the branch you made.

    Long: What's going on here (and why the word "branch" is problematic)

    The first thing to keep in mind is that Git is all about commits. People new to Git, or even those who have been using it for quite a while, often think that Git is about files, or branches. But it isn't, really: it's about commits.

    Each commit is numbered, but the numbers are not simple counting numbers. Instead, each commit gets a random-looking—but not actually random at all—hash ID. These things are big and ugly, and Git will abbreviate them at times (as for instance your a13adadf), but each one of these is a numeric ID for some Git object—in this case, for a Git commit.

    Git has a big database of all of its objects, which it can look up by ID. If you give Git a commit number, it finds that commit's contents, by the ID.

    The contents of a commit come in two parts:

    What this does is allow Git to work backwards. So that is how Git does work, backwards. If we have a long string of commits, all in a row, like this:

    ... <-F <-G <-H
    

    where H stands for the actual hash ID of the last commit in the chain, Git will start with commit H, reading it out of its object database. Inside commit H, Git will find all the saved files, and also the hash ID of earlier commit G. If Git needs it, Git will use this hash ID to read commit G out of the object database. That gives Git the earlier snapshot, and also the hash ID of even-earlier commit F.

    If Git needs to, Git will use hash ID F (as stored in G) to read F, and of course F contains another parent hash ID as well. So in this manner, Git can start with the last commit and work backwards.

    This leaves Git with one problem: how will it quickly find the hash ID of the last commit in the chain? This is where branch names come in.

    A branch name just holds the hash ID of the last commit

    Given the above—and getting a bit lazy on purpose and drawing the connection from commit to commit as a line, instead of an arrow going from child to parent—we can now draw the master branch in like this:

    ...--F--G--H   <-- master
    

    The name master simply contains the actual hash ID of existing commit H.

    Let's add another name, develop, that also contains hash ID H, like this:

    ...--F--G--H   <-- develop, master
    

    Now we have a small problem: which name are we going to use? Here, Git uses the special name HEAD to remember which branch name to use, so let's update the drawing a bit:

    ...--F--G--H   <-- develop, master (HEAD)
    

    This represents the result after git checkout master: the current branch name is now master, and master selects commit H, so that's the commit we're using (and the branch name that we're using too).

    If we run git checkout develop now, Git will switch to that branch. That name still identifies commit H, so there's nothing else to change, but now we have:

    ...--F--G--H   <-- develop (HEAD), master
    

    If we now make a new commit, Git will:

    There's one more thing Git will do but let's draw this part now. The result is:

    ...--F--G--H
                \
                 I
    

    What about the two names? That's the one more thing: Git will write I's hash ID into the current name. If that's develop, we get this:

    ...--F--G--H   <-- master
                \
                 I   <-- develop (HEAD)
    

    Note that master stayed in place, but the name develop has moved to point to the newest commit.

    When two names identify the same commit, either name selects that commit

    Note that initially, when master and develop both selected commit H, it didn't matter, in one sense, which one you used with git checkout. Either way you got commit H as the current commit. But when you make the new commit, now it matters, because Git is only going to update one branch name. No one knows what the new commit's hash ID will be (because it depends in part on the exact second at which you make the commit), but once it's made, develop will hold that hash ID, if develop is the current name.

    Note that if you now git checkout master and make another new commit, the name master will be the one updated this time:

    ...--F--G--H--J   <-- master (HEAD)
                \
                 I   <-- develop
    

    Let's assume for the moment that you have not done this, though.

    Fast-forward

    With the earlier picture in mind, let's run git checkout master now, and go back to working with commit H:

    ...--F--G--H   <-- master (HEAD)
                \
                 I   <-- develop
    

    In this state, let's run git merge develop now.

    Git will do the things it does for git merge—see below—and find that the merge base is commit H, which is also the current commit. The other commit, I, is ahead of commit H. These are the conditions under which Git can do a fast-forward operation.

    A fast-forward is not an actual merge. What happens is that Git says to itself: If I did a real merge, I'd get a commit whose snapshot matches commit I. So instead, I'll take a short cut, and just check out commit I while dragging the name master along with me. The result looks like this:

    ...--F--G--H
                \
                 I   <-- develop, master (HEAD)
    

    and there is now no reason to keep the kink in the drawing—we could make this all one straight row.

    Real merges

    Sometimes, the above kind of fast-forward-instead-of-merge trick just doesn't work. Suppose you start with:

    ...--G--H   <-- develop, master (HEAD)
    

    and make two new commits I-J:

              I--J   <-- master (HEAD)
             /
    ...--G--H   <-- develop
    

    Now you git checkout develop and make two more commits K-L:

              I--J   <-- master
             /
    ...--G--H
             \
              K--L   <-- develop (HEAD)
    

    At this point, no matter which name you give to git checkout, if you run git merge on the other name, there's no way to go forward from J to L, or vice versa. From J, you have to back up to I, then go down to shared commit H, before you can go forward to K and then L.

    This kind of merge, then, cannot be a fast-forward operation. Git will instead do a real merge.

    To perform a merge, Git uses:

    This last—or really, first—commit is the merge base, and the merge base is defined in terms of a graph operation known as Lowest Common Ancestor, but the short and understandable version is that Git works backwards from both commits to find the best shared common ancestor. In this case, that's commit H: the point where the two branches diverge. While commits G and earlier are also shared, they're not as good as commit H.

    So Git will now:

    This is the process of merging, or to merge as a verb. Git will do all of this on its own, if it can. If it succeeds, Git will make a new commit, which we will call M:

              I--J
             /    \
    ...--G--H      M   <-- master (HEAD)
             \    /
              K--L   <-- develop
    

    Note that new commit M points back to both commits J and L. This is in fact what makes this new commit a merge commit. Because a fast-forward is literally not possible, Git must make this commit, in order to achieve the merge.

    You were initially doing fast-forwards

    You started out with this kind of situation:

    ...--G--H   <-- master, develop (HEAD)
    

    which then produced:

    ...--G--H   <-- master
             \
              I   <-- develop (HEAD)
    

    You used git checkout master; git merge develop or similar to get:

    ...--G--H--I   <-- master (HEAD), develop
    

    after which you can repeat the process, making first develop, then both develop and master, name new commit J:

    ...--G--H--I--J   <-- master (HEAD), develop
    

    But at this point you did something different: you did a git revert while on master.

    The git revert command makes a new commit. The new commit's snapshot is like the previous snapshot with one commit backed-out, as it were, so now you have:

                    K   <-- master (HEAD)
                   /
    ...--G--H--I--J   <-- develop
    

    The snapshot in K probably matches that in I (so it re-uses all those files), but the commit number is all-new.

    From here, you did git checkout develop and wrote a better commit than J, which we can call L:

                    K   <-- master
                   /
    ...--G--H--I--J--L   <-- develop (HEAD)
    

    Then you went back to master and ran git merge develop. This time, Git had to make a new merge commit. So it did just that:

                    K--M   <-- master (HEAD)
                   /  /
    ...--G--H--I--J--L   <-- develop
    

    Now, when you go back to develop and make new commits, you get the same pattern:

                    K--M   <-- master
                   /  /
    ...--G--H--I--J--L--N   <-- develop (HEAD)
    

    When you switch back to master and git merge develop, Git must once again make a new merge commit. Fast-forwarding is not possible, and instead you get:

                    K--M--O   <-- master (HEAD)
                   /  /  /
    ...--G--H--I--J--L--N   <-- develop
    

    What you can do about this

    Suppose you now run git checkout develop && git merge --ff-only master. The first step selects develop as the current branch. The second asks to merge with master. This extra flag, --ff-only, tells Git: but only do that if you can do it as a fast-forward.

    (We already believe that Git can do this as a fast-forward, so this --ff-only flag is just a safety check. I think it's a good idea, though.)

    Since a fast-forward is possible, you'll get this:

                    K--M--O   <-- master, develop (HEAD)
                   /  /  /
    ...--G--H--I--J--L--N
    

    Note how the name develop has moved forward, to point to commit O, without adding a new merge commit. This means that the next commit you make on develop will have O as its parent, like this:

                            P   <-- develop (HEAD)
                           /
                    K--M--O   <-- master
                   /  /  /
    ...--G--H--I--J--L--N
    

    If you now git checkout master; git merge develop you'll get a fast-forward, with both names identifying new commit P, and you'll be back in that situation in which committing on develop allows a fast-forward.

    Note that by doing this, you're essentially claiming that you don't need the name develop after all

    If your work-pattern is:

    then all you need to do is make your new commits while on master.

    There's nothing inherently wrong with doing the new commits on another name, and if this is only sometimes your work pattern, that's probably a good habit: using lots of branch names will help you later, and being in the habit of making a new name before starting on work is good. You might want to consider using a name more meaningful than just develop, though.

    In any case, note that what Git cares about here are the commits. The branch names are just ways you can have Git help you find specific commits: the commit found by each name is the point at which you're doing work with that name. The actual branching, if there is any, is a function of the commits you make.

    To put it another way: To make commits form into branches, you need branch names, but having branch names alone does not make commits form into branches. That is:

    ...--F--G--H   <-- master
                \
                 I--J   <-- develop
    

    gives you two "last" commits, but a single linear chain ending at commit J. In one sense, there are two branches, one of which ends at H and one of which ends at J, but in another, there is only one branch, that ends at J. We can add more names, pointing to existing commits:

    ...--F   <-- old
          \
           G--H   <-- master
               \
                I--J   <-- develop
    

    and now there are three names (and three "last" commits) but the actual set of commits in the repository has not changed. We just drew F on a line by itself so as to make the name old point to it.