gitmergefeature-branch

How to share code between 2 feature branches


Let's say I've written a method in feature1 branch and after some time I realize that I need this piece of code in another feature2 branch as well.

So I just copy/paste the code from feature1 into feature2 and the work simultaneously continues on both branches. I cannot merge feature1 into feature2, because then reviewers of feature2 will also have to check the changes from feature1 as well. Then I ask reviewers to review both features.

Assume feature1 is merged into master and then I want to merge feature2 into master as well. But because of the copy/paste I get a merge conflict, so I have to ask for reviews again. This is not a problem per se. But is there a way to avoid this conflict?


Solution

  • Your question starts with some incorrect assumptions:

    Let's take a look at what it means to merge in Git, and some of the common things that go wrong.

    Git is about commits

    Those new to Git often think Git is about files or branches, but it's not. Git is about commits. A commit holds files—each commit has a full snapshot of every file, in fact—and we organize and find our commits using branches, so files and branches have a part. But the heart and soul of Git is the commit.

    Git stores these commits in a big database full of Git "objects". There are four kinds of objects internally: blob, tree, commit, and annotated tag to be exact. But for the most part, humans only deal with the commit objects. These store our commits, and since the commit is the "unit of work" in a Git repository, as it were—because Git stores commits—that's the level at which we deal with Git.

    Unfortunately for us humans, Git's commits are numbered, with big ugly random-looking numbers that have no rhyme or reason;1 they look like 1bcf4f6271ad8c952739164d160e97efd579424f, for instance. Humans can't deal with these, so we just don't. Git provides for us by adding, separately from the big database of commits and other objects, a smaller database of names, including branch and tag names. A name like refs/heads/main or refs/heads/master is a branch name and will turn into the big ugly hash ID for us. So we can give Git a branch name, and Git will fish out the right hash ID, and use that to fish out the right commit.

    That's how and why we can use names like feature1 and feature2. These names mean less than nothing to Git. Git doesn't really need them, and does not care how we spell them or what we do with them—we can rename them whenever we like for instance—and just provides them for us to use so that we don't have to memorize hash IDs. Git turns the names into hash IDs and finds the commits by their hash IDs and gets to work. So Git isn't using your branch name at this time: it's only using the commit itself, which Git found by its hash ID.

    This is how and why Git is all about the commits. We use branch names, but Git mostly doesn't. I say mostly because we're about to hit the point where Git does use branch names, to keep them up to date for us.


    1Technically, they're just outputs from some cryptographic hash. Traditionally, Git uses SHA-1, but Git now supports SHA-256. There's ongoing work on making this more useful: for now, if you're a Git user as opposed to a Git developer, you'll just be using SHA-1.


    What's in a commit

    Remember that each commit in Git has one of those big ugly hash IDs. These are unique to that particular commit: no other Git commit, anywhere, is ever allowed to use that hash ID again.2 So if we take any two Git repositories out on leashes to the Git-repository-park (like taking the dog to the dog park), they can go sniff each other and decide which commits one has that the other doesn't, just by looking at hash IDs. Then one Git repository can get the other's commits, knowing that the first repository is missing those commits based on the hash ID alone.

    We won't worry about this exchange-of-commits stuff here—that's the distributed part of Git being a distributed version control system—but it's important to keep this "unique hash ID" thing in mind as we look at what's inside any one given commit:

    The metadata in any one commit aren't necessarily huge (or small—your commit log message goes here, so if you write a really big one, it's here, occupying that space). But besides your name and email address, Git adds, to each commit, a list of parent commit hash IDs. This list is usually exactly one entry long.

    What this means is that given the latest commit, we can have Git work backwards to find the second-latest commit, all on its own. Let's draw this. Suppose the latest commit has some big ugly hash ID that we won't try to guess but will just call H, for "hash". We'll draw it like this:

                <-H
    

    Commit H has a little arrow sticking out of it, in our drawing. In reality, commit H has a hash ID in its metadata, and that hash ID is the hash ID of the commit that comes before H. Let's call that commit G and draw it in:

            <-G <-H
    

    Of course, commit G is a commit, so it has a hash ID "arrow" like H's. Commit G's arrow points to the commit that comes before G, which we'll call F:

    ... <-F <-G <-H
    

    Commit F has an arrow that points to its parent, and so on. So all we have to do, to have Git find all the commits, is somehow have Git find the latest commit H.

    Well, we just said earlier that a branch name like main or feature1 stores a hash ID. So this name points to H, just like H points to G, and so on:

    ...--F--G--H   <-- main
    

    One of the tricks that Git has to use, to keep the hash IDs working, is that all parts of any commit are frozen for all time. That includes the hash IDs that point backwards to previous commits. So H will always point to G, which will always point to F, and so on. As such, I get to be a little lazy about drawing the arrows that connect commits to each other.

    This is not the case for branch names. The arrows in a branch name move.


    2This constraint gets relaxed a bit in two Git repositories that never meet. As long as they don't meet, the two separate Git repositories are allowed to accidentally re-use a hash ID. This doesn't really happen in practice anyway, especially because it's humans who control which repositories eventually meet. Git doesn't know what those crazy humans will do in the future, so Git just tries to ensure that every commit gets a totally unique hash ID.


    Making a new commit

    To make a new commit, we check out some branch with git checkout, or use git switch to "switch to" the branch, thus "checking it out", to the same effect. Git remembers which branch name we used, by attaching the special name HEAD to one of the branch names in the repository. At this particular point we only have one name, main, so there's not that much need for it, but we have this:

    ...--F--G--H   <-- main (HEAD)
    

    Let's create a new branch name now though. Let's create the name feature1. This name must point to some existing commit. We can pick any commit in the repository, but typically we'll pick the latest main-branch commit (or maybe the latest develop-branch commit or something, but for now we only have main anyway). So the new name feature1 will also point to commit H:

    ...--F--G--H   <-- feature1, main (HEAD)
    

    Note how all the commits are on both branches. Both names select commit H right now. That's about to change, though.

    We now use git switch feature1 or git checkout feature1 to select the name feature1 with which to select commit H. This changes our picture:

    ...--F--G--H   <-- feature1 (HEAD), main
    

    We have not changed commits, so we are working with the same files, but we have changed which branch name we are using to find commit H.

    Now we do our usual thing of modifying and git add-ing and git commit-ing. When Git is done making the new commit, the new commit holds a new snapshot of all of the files (frozen, compressed, de-duplicated, and read-only), and the new commit—which we will call commit I—has commit H as its parent:

                 I
                /
    ...--F--G--H
    

    But—here's Git's little magic trick—Git has stored I's hash ID in the current branch, the name to which HEAD is attached. So if we include the branch names in the picture, we now have this:

                 I   <-- feature1 (HEAD)
                /
    ...--F--G--H   <-- main
    

    New commit I is only on feature1 right now. Commits up through H continue to be on both branches. If we make another new commit J, we get:

                 I--J   <-- feature1 (HEAD)
                /
    ...--F--G--H   <-- main
    

    If we now git switch main or git checkout main, we get:

                 I--J   <-- feature1
                /
    ...--F--G--H   <-- main (HEAD)
    

    Git will remove, from our work area, the files from commit J, and put in place the files from commit H instead. (We haven't covered the working tree and Git's index here, and for space reasons, we won't.)

    Let's now make a second branch name, feature2, that also points to commit H, and then switch to feature2:

                 I--J   <-- feature1
                /
    ...--F--G--H   <-- feature2 (HEAD), main
    

    As we make new commits on feature2, they cause feature2 to grow, just as happened with feature:

                 I--J   <-- feature1
                /
    ...--F--G--H   <-- main
                \
                 K--L   <-- feature2 (HEAD)
    

    So that's a branch

    This is really what branches, in Git, are about. We call the latest commit, as found by some branch name, the tip commit of the branch. (That's an official Git term.) We call that commit plus some string of earlier commits "the branch", and we also call the name "the branch". So when someone says "branch feature1", they might mean:

    or perhaps some other thing. The word branch, in Git, is rather badly overused, and it's often a good idea to be more specific (you can say "branch name" or "tip commit" or "set of commits", for instance).

    Merging

    When we have diverging branches like the above—feature1 and feature2 diverge from commit H and end at commits J and L respectively—we often later want to combine work. That is, given:

              I--J   <-- feature1
             /
    ...--G--H
             \
              K--L   <-- feature2
    

    we'd like to get a single commit M that has, as its snapshot, a set of files that:

    We often achieve this in Git using git merge.

    To run git merge, we:

    So we run git switch feature1 && git merge feature2, or maybe git switch feature2 && git merge feature1.

    When we do this, Git will:

    Our goal, remember, is to combine work. Commits, however, don't contain work. They contain snapshots: complete archives of the entire source.

    So, by finding a "best" common starting point—which in this case is obviously commit H—Git can simply compare the files in commit H with those in commit J, to see what changed on feature1.

    The output of this comparison is a line-by-line set of changes to file-by-file changes for any changed files in the two commits. Files that didn't change at all—that stayed the same from H to J—aren't mentioned. That's what you'll see if you run git diff on commits H and J, and that's what git merge will see.

    Having figured out which files changed, and what changed in them, from H to J, Git now runs the same kind of comparison, from commit H to commit L. As before, this finds out which files were changed and what changed within those files, line-by-line.

    The git merge command now combines the changes. If "we" (H-vs-J, if we're on feature1 now) touched some file and they (H-vs-L) didn't, Git keeps our changes. If we didn't touch the file but they did, Git keeps their changes. If we both touched the file, Git tries to combine our changes.

    You get a merge conflict if and when we and they made different changes to the same source lines. You also get a merge conflict if we and they touched two line ranges that "touch at the edges" (abut). All this means is that Git is not sure about how to combine these changes. Your job as the programmer is to provide the correct combination.

    That's what a merge conflict is about: Git isn't sure if taking the changes line-by-line is right. If you don't get a merge conflict, Git is sure that taking the changes line-by-line is right, even if it isn't actually right. Git is not smart here: Git is following ridiculously simple rules about text lines.

    Once you fix the merge conflicts, or if Git has no merge conflicts, Git makes a new commit as usual. The one thing that is special about this new commit M is that instead of just the one parent J, it has a second parent, L: the commit we said that Git should merge. Git stores the new merge commit's hash ID into the current branch name as usual, so we get:

              I--J
             /    \
    ...--G--H      M   <-- feature1 (HEAD)
             \    /
              K--L   <-- feature2
    

    Because commit M connects backwards to both commits J and L, commits K-L, which used to be only on feature2, are now on both branches. Commits I-J-M are still only on feature1 here because L is still the tip commit of feature2, and Git can only work backwards, not forwards. So from L we go backwards to K, then H, then G, never seeing commits I-J-M.

    Trivial merges

    Sometimes we make merges that are really easy for Git:

              H--I   <-- feature
             /
    ...--F--G   <-- main
    

    We run git switch main and then git merge --no-ff feature (the --no-ff is required to make this act like GitHub's "merge" button; it defeats a short-cut that Git normally takes here). Git finds the common starting commit, but that's commit G, which is also the tip commit of main. So a full merge consists of:

    The result looks like this:

              H--I   <-- feature
             /    \
    ...--F--G------M   <-- main (HEAD)
    

    (I called the merge commit M again, for Merge; in reality it gets a unique hash ID, like every commit.) The snapshot in M is guaranteed to match the snapshot in I, because G-vs-G never has any changes to add, while G-vs-I always has the changes to add that result in the I snapshot.

    If we don't prevent Git from doing so, Git will turn this trivial merge into a fast-forward operation, which isn't really a merge at all. Instead of a new commit, we just get this:

              H--I   <-- feature, main (HEAD)
             /
    ...--F--G
    

    That is, Git just scoots the name main forward two hops, like fast-forwarding a tape recorder. It's literally just a checkout that drags the name—in this case, main—forward. Git swaps out the commit-G files in our working tree for the commit-I files. No merge is needed, so no merge happens; no merge conflict can happen, so no merge is needed.

    Force the merge with --no-ff (no fast forward) and the merge happens and you get a new merge commit. Sometimes you want this (for release tagging purposes for instance) and sometimes you don't care. To know whether you want it, you need to know that Git is all about commits. A new commit gets a new, unique hash ID, which we can tell apart from every other commit. "Re-use" a commit like fast-forward does and we don't get a new commit and therefore it's the same old commit as before.

    Cherry-picking

    Suppose we have:

              I--J--K   <-- br1
             /
    ...--G--H
             \
              L   <-- br2 (HEAD)
    

    We suddenly realize that commit J, say, fixed a nasty bug that we need to be fixed in br2. We could copy and paste the code changes from that commit, but if that commit exactly fixes the bug, it would be nice if we could get Git to *compare the commit before that commit—commit I—to that commit to see what changed. That is, we'd like Git to diff the snapshots in I and J to see which files had what changes made.

    Given that Git can do that easily, we have Git do it. Then we have Git apply those same changes to our current versions of those files in commit L. We could have Git just literally make the same changes to the same lines, but what happens if, say, the fix for thing.py is on line 45 in their version, but we added a new function near the top and the fix goes on line 70 in our version of thing.py?

    Well, we can have Git apply the fix a lot more cleverly. If we have Git diff commit I's version of thing.py against commit L's version of thing.py, that will show our added function, and that what was line 45 is now line 70. So Git will be able to apply their change to line 70, which is the correct line.

    But hang on a minute. We're having Git compare the file in snapshot I to the file in snapshot J and also to the file in snapshot L. What were we doing a moment ago with git merge? We were doing the exact same thing with git merge. Merge compares snapshots and combines changes.

    That's exactly how cherry-pick works: it's literally a merge operation, with the merge base being forced. We're cherry-picking from commit J. Commit J's parent is commit I. So Git uses commit I as the merge base, commit J as "their" branch-tip commit, and our current commit L as "our" commit. Git makes the usual diffs, and then combines work as usual. The thing that's not like a merge is that, once Git is done with the combining-work part, Git makes an ordinary (non-merge) commit:

              I--J--K   <-- br1
             /
    ...--G--H
             \
              L--N   <-- br2 (HEAD)
    

    New commit N will make the same changes to L that J-vs-I made to I, adjusted as necessary. The cherry-pick code uses the merge engine to achieve the "adjusted as necessary" part.

    Cherry-picking can therefore get merge conflicts during the cherry-pick operation! That's normal, and as with git merge it is nothing to be afraid of: you, as the programmer, merely need to supply the correct result. Whatever you tell Git is the correct result, Git will believe you: that's what goes into the new commit's snapshot.

    If you had to modify the code when you cherry-picked it, it's very likely that you'll get a merge conflict later if you merge N and L, for instance. That's because we took a change (to some set of lines) and modified the change, so later, when Git goes to combine changes, it will see slightly different changes that affect the "same lines", as it were. We'll have to resolve another merge conflict later. There's no guarantee that we won't have to resolve a merge conflict later even if this doesn't happen, though. Mostly, we just let the merge conflicts happen as they do, and fix them up manually. That's part of the job of being a programmer.