Still new to git version management and confused about the below case.
Let's say I committed/pushed and PR'ed fileA
and fileB
from branch1. (these changes are not yet merged at this point). Then, I created and checked out a new branch called 'branch2', then created new files called fileC
and fileD
which I want to commit/push and PR from this branch2.
However, since my first PR from branch1 is not yet merged so the second PR from branch2 contains my previous committed files (fileA
and fileB
).
How can I commit/push and PR only specific files from multiple branches at the same time before the merge happens? The reason why I didn't want to git pull from origin master
after the first PR is I want to make changes afterward if I have to refactor something.
If I follow the below steps, the commit/PR from branch 2 only contains changes I made after I check out to the branch2?
git pull from origin master
Jump to the Summary section at the end.
There is a bunch of stuff you need to know here. Worse, you need to learn it all at once (well, mostly): that's a big hill to climb. But it can be done.
The first thing to know is that Git is really about commits. It's not about files, although we (humans) use commits to store files. It's not about branches either, although we (humans and Git) use branch names to find commits. In the end, Git is all about the commits. So you need to think in terms of commits.
Each commit has a unique number. This number is expressed as a big, ugly, random-looking hash ID in hexadecimal, such as 7e391989789db82983665667013a46eabc6fc570
. That particular number is now taken, and no other commit, anywhere, ever, can have that number.1 This is why the numbers are so big and ugly: they need to be unique.
The unique numbers are how Git finds a commit. They act as a key in a simple key-value database, with the value being the contents of the commit. In fact, the key is a cryptographic hash of the contents, making the whole thing some sort of Ourorboros. Git makes sure that the key always matches, which detects any kind of error or corruption of the database, and means that no part of any commit can ever change. So the contents of a commit are frozen for all time.
But what, exactly, are these contents? For our purposes they're split into two parts:
Every commit contains a full (but read-only) snapshot of every file, as of the state that file had at the time you, or whoever, made the commit.
And, every commit has some metadata. This is information about the commit itself, such as who made it, and when. There is a free-form field—the log message—where you can say why you made this commit. (There's no point in saying what since that's in the snapshot. There's no point in saying what changed either, since that's easily detected by comparing against the previous snapshot. The reason why you made some change, however, is rarely quite so obvious; even a simple clue like "fix bug #1234" can be quite helpful here.)
In the metadata, Git automatically stores the hash ID of the previous commit. This builds up a chain—a one-way strand of pearls, perhaps—of commits:
... <-F <-G <-H
where H
here stands in for the hash ID of the latest commit. If we just knew that hash ID somehow, we could have Git extract the metadata and/or the snapshot stored in H
. The metadata lets us (or Git) find earlier commit G
's hash ID, which lets Git extract the metadata and/or snapshot of G
, too. That lets Git find commit F
, which lets it extract the metadata and/or snapshot, and so on, all the way back through history.
The history is the commits, and the commits are the history. There is no more and no less: we start at the end—at commit H
—and have Git work backwards, and these commits are the history in the repository.
But there's one obvious catch: How did we find the hash ID of commit H
? Did we memorize it? Did we jot it down on the office whiteboard? Where do we get this latest commit hash ID?
1Technically, that number is only taken in all Git repositories for Git. If your Git repository is never going to meet up with a Git repository for Git, your Git repository could use that number for something of its own. But in practice, these things are truly unique. A doppelgänger commit—a commit with the same number, but different contents—would not cause the universe to explode as on Star Trek, but would be a problem.
This is where branch names come in. Suppose we have Git automatically save the hash ID of our latest commit in a branch name, like this:
...--F--G--H <-- master
Now we don't have to memorize some big ugly hash ID any more. The name master
, which is far easier to remember, holds the right hash ID.
If we want another branch name, we just make another one. We'll make it point to commit H
too, like this, at least for the moment:
...--G--H <-- branch1, master
We need one more thing though: a way to have Git remember which name we're using. Right now, it does not matter, because both names select commit H
, but we are about to change that. So for this purpose let's add a special name, HEAD
, and attach it to just one branch name:
...--G--H <-- branch1, master (HEAD)
This means we are on branch master
, as git status
will say. We are using commit H
from the name master
.
If we run:
git checkout branch1
we get:
...--G--H <-- branch1 (HEAD), master
We're still using commit H
, but from the name branch1
now.
Now let's make a new commit. We'll change two files, fileA
and fileB
, and run git add
on them2 and then git commit
. Git will demand a log message to go into a new commit, then actually make a new commit. We don't know what big ugly hash ID this new commit will get,3 just that it's unique. We'll call it commit I
, though, using the next letter after H
.
The parent (predecessor commit) for new commit I
will be the current commit H
, so I
will point back to H
, just like H
points back to G
. And then, because we just made that commit just now, the branch name that Git will update is the branch name that has HEAD
attached. So the result looks like this:
...--G--H <-- master
\
I <-- branch1 (HEAD)
Commit I
has a full snapshot of every file. It doesn't just have fileA
and fileB
, and it does not have some kind of instruction set of the form make these changes. It just has the entire set of files, saved forever now.4
2You might—in fact you should—wonder why we have to run git add
here. Or, if you're using git commit -a
, you should wonder what -a
really means. I'll leave that out, though, to avoid having this answer get really big.
3Since the new commit contains a time stamp, and we don't know what the exact time will be to the second, there's no way to predict the new commit's hash ID.
4Git uses a bunch of tricks, including file de-duplication (right away) and—later; never immediately—delta compression to make the big database of every Git commit and other Git objects stay small. The de-duplication part takes care of the fact that you didn't change any of the other files, and also makes it really easy for Git to see that commit I
and commit H
share most of their files.
Commits can also be removed, but this is tricky, and not something you normally do or even think about. Commits mostly only get removed by being replaced by new-and-improved, different-hash-ID updated ones, and even that is tricky.
branch2
Then, I created and checkout new branch called
branch2
, then created new files calledfileC
andfileD
which I want to commit/push ...
Now that we have this:
...--G--H <-- master
\
I <-- branch1 (HEAD)
in the repository, we have more candidate commits than ever, for where the new name branch2
should point. We used commit H
last time because it was the latest commit. But now there are two "latest" commits:
master
, commit H
, on the branch that consists of all commits up to and including H
; andbranch1
, commit I
, on the branch that consists of all commits passing through H
and reaching up to commit I
.If we pick commit I
as our starting point, any new commit J
we make will include everything we've done to get to commit I
.
The trick, then, is to start new branch branch2
not from the latest latest commit, but from the old latest commit, which is still latest on master
. That is, we want:
...--G--H <-- master, branch2 (HEAD)
\
I <-- branch1
We want to get back onto commit H
and make our new name branch2
start from there. Then we'll make some changes to some files, run git add
and git commit
as usual, and get a new commit J
whose parent is H
:
J <-- branch2 (HEAD)
/
...--G--H <-- master
\
I <-- branch1
You didn't do that. Instead, you made your branch2
by starting with commit I
, giving you:
...--G--H <-- master
\
I <-- branch1
\
J <-- branch2 (HEAD)
So now when you sent commit J
to be reviewed for merging, what you got was a request for reviewing and then merging, to (their) master
, both commits.
This is because branches do not matter except in terms of finding the last commit. We use the name to find the last commit, and then use the commits to find earlier commits. So to Git, this is now all about adding commits I
and J
to master
.
Commit J
is, in a sense, bad/wrong. In the future, it might be good/right, because if you get commit I
itself directly added to master
in all Git repositories—I'm skipping over a lot of details here—you'll have this:
...--G--H--I <-- master, branch1
\
J <-- branch2 (HEAD)
and now commit J
would be good/right. But for right now, you want instead some commit—we could call it K
, because that's the next letter, but let's call it J'
("Jay-Prime") to indicate that it's a sort of "new and improved" version of J
—that comes after commit H
.
To get this, we must:
H
; thenJ
—the two changes to fileC
and fileD
—to our new improved commit, copying the commit message from J
too, while making this new commit.The end result should look like this:
J' <-- branch2 (HEAD)
/
...--G--H <-- master
\
I <-- branch1
\
J ???
Commit J
will still exist (locally and wherever you've sent it). But it's been replaced by our new-and-improved J'
.
There is an easy way to achieve this locally, in your own Git repository. You simply run:
git checkout branch2 # if needed
git rebase --onto master branch1
This --onto
form of git rebase
is a little bit fancier than a standard git rebase
. The --onto master
part tells Git where to put the new commits, and the branch1
part tells Git what commits not to copy. Remember that Git finds commits by starting from some branch name—branch2
, in this case—and working backwards. So how does rebase know when to stop? The answer is that we must tell it.
The main thing to know about git rebase
is that we use it when we have some existing commits that are ... mostly OK, but somehow defective. It works—it has to work—by copying those old commits to new-and-improved versions. It has to work that way because no part of any Git commit can ever be changed.
The good side effect of all of this is that we end up with the new and improved commits. The bad side effect is that because Git identifies commits by hash ID, our branch name will be yanked away from the old commit(s) and will find the new and improved commit(s) instead.
Well, yes. But also no.5 It is good because these are the commits we wish to find. It is bad because if we've given the old commits to some other Git, we now have to convince that Git to switch over to these new-and-improved commits, too. If we do that with a simple branch name and git push
, this is going to require some kind of --force
flag.
So, as long as there are no conflicts during all this commit-copying—in your case, there won't be, as you touched entirely different files in commits I
and J
—the rebase will work on its own, but then we need to update our pull request with:
git push --force-with-lease origin branch2
or similar.6
5FRODO: But it is said: "Go not to the Elves for answers, for they will say both no and yes."
6The --force-with-lease
is a slightly safer variant of git push --force
. To keep this answer short, I've omitted all the details.
git checkout branch2
git rebase --onto master branch1
git push --force-with-lease origin branch2
Ideally, all will go well, and regardless of whether you're using a GitHub fork model or a GitHub shared-repository model, this will update your PR.