gitgithub

How to commit and pull request only specific files from multiple branches before the merge happens?


Still new to git version management and confused about the below case.

Let's say I committed/pushed and PR'ed fileA and fileB from branch1. (these changes are not yet merged at this point). Then, I created and checked out a new branch called 'branch2', then created new files called fileC and fileD which I want to commit/push and PR from this branch2.

However, since my first PR from branch1 is not yet merged so the second PR from branch2 contains my previous committed files (fileA and fileB).

How can I commit/push and PR only specific files from multiple branches at the same time before the merge happens? The reason why I didn't want to git pull from origin master after the first PR is I want to make changes afterward if I have to refactor something.

If I follow the below steps, the commit/PR from branch 2 only contains changes I made after I check out to the branch2?

  1. commit/PR from branch 1
  2. checkout to master and do git pull from origin master
  3. create and checkout to branch 2
  4. commit/PR from branch 2

Solution

  • TL;DR

    Jump to the Summary section at the end.

    Long

    There is a bunch of stuff you need to know here. Worse, you need to learn it all at once (well, mostly): that's a big hill to climb. But it can be done.

    The first thing to know is that Git is really about commits. It's not about files, although we (humans) use commits to store files. It's not about branches either, although we (humans and Git) use branch names to find commits. In the end, Git is all about the commits. So you need to think in terms of commits.

    Each commit has a unique number. This number is expressed as a big, ugly, random-looking hash ID in hexadecimal, such as 7e391989789db82983665667013a46eabc6fc570. That particular number is now taken, and no other commit, anywhere, ever, can have that number.1 This is why the numbers are so big and ugly: they need to be unique.

    The unique numbers are how Git finds a commit. They act as a key in a simple key-value database, with the value being the contents of the commit. In fact, the key is a cryptographic hash of the contents, making the whole thing some sort of Ourorboros. Git makes sure that the key always matches, which detects any kind of error or corruption of the database, and means that no part of any commit can ever change. So the contents of a commit are frozen for all time.

    But what, exactly, are these contents? For our purposes they're split into two parts:

    In the metadata, Git automatically stores the hash ID of the previous commit. This builds up a chain—a one-way strand of pearls, perhaps—of commits:

    ... <-F <-G <-H
    

    where H here stands in for the hash ID of the latest commit. If we just knew that hash ID somehow, we could have Git extract the metadata and/or the snapshot stored in H. The metadata lets us (or Git) find earlier commit G's hash ID, which lets Git extract the metadata and/or snapshot of G, too. That lets Git find commit F, which lets it extract the metadata and/or snapshot, and so on, all the way back through history.

    The history is the commits, and the commits are the history. There is no more and no less: we start at the end—at commit H—and have Git work backwards, and these commits are the history in the repository.

    But there's one obvious catch: How did we find the hash ID of commit H? Did we memorize it? Did we jot it down on the office whiteboard? Where do we get this latest commit hash ID?


    1Technically, that number is only taken in all Git repositories for Git. If your Git repository is never going to meet up with a Git repository for Git, your Git repository could use that number for something of its own. But in practice, these things are truly unique. A doppelgänger commit—a commit with the same number, but different contents—would not cause the universe to explode as on Star Trek, but would be a problem.


    Branch names

    This is where branch names come in. Suppose we have Git automatically save the hash ID of our latest commit in a branch name, like this:

    ...--F--G--H   <-- master
    

    Now we don't have to memorize some big ugly hash ID any more. The name master, which is far easier to remember, holds the right hash ID.

    If we want another branch name, we just make another one. We'll make it point to commit H too, like this, at least for the moment:

    ...--G--H   <-- branch1, master
    

    We need one more thing though: a way to have Git remember which name we're using. Right now, it does not matter, because both names select commit H, but we are about to change that. So for this purpose let's add a special name, HEAD, and attach it to just one branch name:

    ...--G--H   <-- branch1, master (HEAD)
    

    This means we are on branch master, as git status will say. We are using commit H from the name master.

    If we run:

    git checkout branch1
    

    we get:

    ...--G--H   <-- branch1 (HEAD), master
    

    We're still using commit H, but from the name branch1 now.

    Now let's make a new commit. We'll change two files, fileA and fileB, and run git add on them2 and then git commit. Git will demand a log message to go into a new commit, then actually make a new commit. We don't know what big ugly hash ID this new commit will get,3 just that it's unique. We'll call it commit I, though, using the next letter after H.

    The parent (predecessor commit) for new commit I will be the current commit H, so I will point back to H, just like H points back to G. And then, because we just made that commit just now, the branch name that Git will update is the branch name that has HEAD attached. So the result looks like this:

    ...--G--H   <-- master
             \
              I   <-- branch1 (HEAD)
    

    Commit I has a full snapshot of every file. It doesn't just have fileA and fileB, and it does not have some kind of instruction set of the form make these changes. It just has the entire set of files, saved forever now.4


    2You might—in fact you should—wonder why we have to run git add here. Or, if you're using git commit -a, you should wonder what -a really means. I'll leave that out, though, to avoid having this answer get really big.

    3Since the new commit contains a time stamp, and we don't know what the exact time will be to the second, there's no way to predict the new commit's hash ID.

    4Git uses a bunch of tricks, including file de-duplication (right away) and—later; never immediately—delta compression to make the big database of every Git commit and other Git objects stay small. The de-duplication part takes care of the fact that you didn't change any of the other files, and also makes it really easy for Git to see that commit I and commit H share most of their files.

    Commits can also be removed, but this is tricky, and not something you normally do or even think about. Commits mostly only get removed by being replaced by new-and-improved, different-hash-ID updated ones, and even that is tricky.


    How to make branch2

    Then, I created and checkout new branch called branch2, then created new files called fileC and fileD which I want to commit/push ...

    Now that we have this:

    ...--G--H   <-- master
             \
              I   <-- branch1 (HEAD)
    

    in the repository, we have more candidate commits than ever, for where the new name branch2 should point. We used commit H last time because it was the latest commit. But now there are two "latest" commits:

    If we pick commit I as our starting point, any new commit J we make will include everything we've done to get to commit I.

    The trick, then, is to start new branch branch2 not from the latest latest commit, but from the old latest commit, which is still latest on master. That is, we want:

    ...--G--H   <-- master, branch2 (HEAD)
             \
              I   <-- branch1
    

    We want to get back onto commit H and make our new name branch2 start from there. Then we'll make some changes to some files, run git add and git commit as usual, and get a new commit J whose parent is H:

              J   <-- branch2 (HEAD)
             /
    ...--G--H   <-- master
             \
              I   <-- branch1
    

    Unfortunately, it's a bit late for that now ...

    You didn't do that. Instead, you made your branch2 by starting with commit I, giving you:

    ...--G--H   <-- master
             \
              I   <-- branch1
               \
                J   <-- branch2 (HEAD)
    

    So now when you sent commit J to be reviewed for merging, what you got was a request for reviewing and then merging, to (their) master, both commits.

    This is because branches do not matter except in terms of finding the last commit. We use the name to find the last commit, and then use the commits to find earlier commits. So to Git, this is now all about adding commits I and J to master.

    ... so here is how we fix it

    Commit J is, in a sense, bad/wrong. In the future, it might be good/right, because if you get commit I itself directly added to master in all Git repositories—I'm skipping over a lot of details here—you'll have this:

    ...--G--H--I   <-- master, branch1
                \
                 J   <-- branch2 (HEAD)
    

    and now commit J would be good/right. But for right now, you want instead some commit—we could call it K, because that's the next letter, but let's call it J' ("Jay-Prime") to indicate that it's a sort of "new and improved" version of J—that comes after commit H.

    To get this, we must:

    The end result should look like this:

              J'  <-- branch2 (HEAD)
             /
    ...--G--H   <-- master
             \
              I   <-- branch1
               \
                J   ???
    

    Commit J will still exist (locally and wherever you've sent it). But it's been replaced by our new-and-improved J'.

    There is an easy way to achieve this locally, in your own Git repository. You simply run:

    git checkout branch2               # if needed
    git rebase --onto master branch1
    

    This --onto form of git rebase is a little bit fancier than a standard git rebase. The --onto master part tells Git where to put the new commits, and the branch1 part tells Git what commits not to copy. Remember that Git finds commits by starting from some branch name—branch2, in this case—and working backwards. So how does rebase know when to stop? The answer is that we must tell it.

    The main thing to know about git rebase is that we use it when we have some existing commits that are ... mostly OK, but somehow defective. It works—it has to work—by copying those old commits to new-and-improved versions. It has to work that way because no part of any Git commit can ever be changed.

    The good side effect of all of this is that we end up with the new and improved commits. The bad side effect is that because Git identifies commits by hash ID, our branch name will be yanked away from the old commit(s) and will find the new and improved commit(s) instead.

    Wait, isn't that a good side effect?

    Well, yes. But also no.5 It is good because these are the commits we wish to find. It is bad because if we've given the old commits to some other Git, we now have to convince that Git to switch over to these new-and-improved commits, too. If we do that with a simple branch name and git push, this is going to require some kind of --force flag.

    So, as long as there are no conflicts during all this commit-copying—in your case, there won't be, as you touched entirely different files in commits I and J—the rebase will work on its own, but then we need to update our pull request with:

    git push --force-with-lease origin branch2
    

    or similar.6


    5FRODO: But it is said: "Go not to the Elves for answers, for they will say both no and yes."

    6The --force-with-lease is a slightly safer variant of git push --force. To keep this answer short, I've omitted all the details.


    Summary

    git checkout branch2
    git rebase --onto master branch1
    git push --force-with-lease origin branch2
    

    Ideally, all will go well, and regardless of whether you're using a GitHub fork model or a GitHub shared-repository model, this will update your PR.