according to one of the answers to this question https://stackoverflow.com/a/18011273/5238559, LOCAL, BASE and REMOTE files will not be altered in the merge process, but only the resulting MERGED file.
during a merge in meld, I would modify the middle panel (BASE) by moving over code from left (LOCAL) and right (REMOTE). I understood that BASE will be a sort of "preview" for what the finally merged file will look like, but it wont be saved directly, which seems like a logical safety step.
however, I can also move code from BASE to LOCAL or REMOTE, and, when I close meld, I'll be asked to save the changes to all three files. why can I do this if only BASE (i.e. MERGED) is relevant to the merge process? what happens with the modifications in LOCAL and REMOTE?
Git doesn't use your working tree files except when you (or something) run(s) git add
. Note that git mergetool
runs git add
on only one of the files that meld
works with. So you can write as many extra files as you like. Git doesn't care. It only cares about that one particular file when meld
is done.
Presumably you're running this merge tool meld
via git mergetool
. The way git mergetool
works is ridiculously simple, once you understand how merge itself works, and that's why you can modify all these files: because they are all just files.
For all this to make sense, you need to know how git merge
works. This gets us into the distinctions between:
meld
or vim
or whatever, can actually see and edit.The third one of these—your work-tree—is the only place that holds files that you can see. But—and this is very important—your work-tree is not in Git at all. It's just a place that Git sticks files into, so that you can see them and work on / with them. Later, git add
will copy one of these files back into Git's index. If you use git mergetool
to run a merge tool, the git mergetool
code runs git add
for you.
The mergetool script runs git add
on the merged file (by name) so whatever is in that file is what gets git add
ed. Any remaining files are just junk as far as Git is concerned: they are simply untracked files. I believe mergetool should clean up the junk files (but should does not mean always will and opinions may differ on the should part too; there's a "keep backup" option here, which I have never used).
You may be able to skip some sections below, depending on how familiar you are with Git. I will try to keep them short (by leaving a lot out) but they are still going to be long.
Each Git commit is given a unique number. These numbers are not simple counting numbers—we don't have commit #1 followed by #2, then #3, and so on. Instead, the numbers are random-looking, big, ugly hash IDs computed by a cryptographic hash function. These numbers are unique across all Git repositories everywhere (which is how Git manages the distributed nature of commits), but all we need to know here is that commits are numbered.
Each commit holds two things. All parts of the commit are read-only, so these things are unchangeable, and are valid forever—or at least as long as the commit itself continues to exist:
Each commit has a full snapshot of every file, stored in a special archival format that only Git can read. (This format is compressed, often highly so, and de-duplicates file contents. It can store files that your OS may be unable to use effectively, or even check out in some cases; in those cases, merging will be difficult or impossible.) The files that are in the commit are determined by what is in Git's index, as described in the next section, at the time someone runs git commit
.
Each commit also has some metadata, or information about the commit itself. This includes the name and email address of an author, and another for a committer. Each of those has a date-and-time-stamp. There is space for a log message, to be written by whoever makes the commit, to describe why they made this commit. And, so that Git can string commits together backwards, each commit records the hash ID(s) of its parent commit(s).
A merge commit is simply a commit that has at least two parent hash IDs in it. The git merge
command often makes such a commit at the end: the first parent is the same parent that any ordinary non-merge commit would have, and the second parent is the hash ID of the commit that you just merged (e.g., the tip commit of a branch you merged by branch-name). The snapshot part of a merge is the same as any commit: it's just a full copy of every file as recorded in Git's index at the time the merge is completed.
Git's index has three names: Git calls it the index (as I am doing here), the staging area (for normal commits at least), and—rarely these days, mostly in flags like --cached
—the cache. For normal, non-merge commits, I like to describe the index as holding your proposed next commit.
What's in the index is—normally—a list of tuples: name, mode, and hash ID:
The name is a file name, complete with forward slashes like top/sub/file.ext
. At this level, Git doesn't "think about" directories holding files: it just has files with long names that contain slashes. Even on Windows, these slashes go forwards, even though Git has to put such a file into a file named file.ext
inside a folder named top
containing a subfolder sub
, which Windows would prefer to express as top\sub\file.ext
. The index insists on forward slashes internally. (This normally doesn't show up to users, it's just a way to understand the problem Git has that prevents it from storing an empty folder. Such a thing simply can't exist in Git's index: the index only holds files.)
The mode, for an ordinary file, really just remembers whether it's +x
or -x
: an executable file, or a non-executable file. For hysterical reasons this is stored as either 100755
or 100644
respectively.
The hash ID has to do with how Git stores file content internally, as a blob object. These things are compressed and read-only, and if the object is stored as a packed object, it may be even-more-compressed using delta encoding.
Again, that's in the normal, non-merge case. These entries have a stage number (because the index is the "staging area") that is always zero. This is what makes them normal.
When git merge
starts, it expands the index. It replaces all the stage-zero entries, which represent the current commit–the index needs to match the current commit at the start of the merge operation—with stage 2 entries. This also opens up spaces for stage 1 and stage 3 entries. We'll come back to this below.
Both committed files—which are stored via blob hash IDs—and the index, which literally stores these same kinds of blob hash IDs, store the internal format versions of Git files, in which contents are compressed and de-duplicated, and maybe even delta-encoded. This format is suitable for archiving (because it's compressed and de-duplicated) but not for getting any actual work done. So Git has to extract such a file, from a commit or from Git's index, expanding out any compression.
The result of extracting an archived blob object goes into an ordinary file. These files need to live somewhere, and that somewhere is your working tree. So git checkout
or git switch
works by copying files out, from a commit into Git's index—this part is fast and cheap as the index holds the files in the same format as the commit—and then to your working tree.
The copying out to your working tree is slowish, but Git gets to cheat. Because the index keeps track of what's in your working tree, Git can usually tell very quickly if the working tree file is untouched from the last checkout. It can also tell, just by checking hash IDs, whether the file in the new commit you're checking out now is the same as the file in the old commit you had checked out before. If all goes well—and usually it does—Git can just leave the file alone, so it does.
In principle, then, a git checkout
of a different commit has to remove every old file (from Git's index and your working tree) and then fill in every new file from the new commit. Git just skips a lot of this work, which means a multi-megabyte or gigabyte checkout can take very little time (sometimes just a few milliseconds but this depends strongly on OS, caches, and other details, and also on the switch from commit X to commit Y not needing to change a lot of working tree files).
Other than this, though, your working tree is just a regular old set of files and directories / folders (whichever term you prefer). Everything that works on your computer, works here. Aside from writing into it when you tell it—e.g., with git checkout
—Git just lets you play with it to your heart's content. Then you can run git status
, which only looks at it, or git add
, which copies from it into Git's index. Until you do either of these, though, Git is completely hands-off.
In short, your working tree is yours, to do with as you will. You can create files here that Git never needs to know about. As long as (a) you don't git add
them and (b) they never come out of some existing commit, they never get into Git's index, and Git never knows about them. The git status
command will whine about them, and you will need to list such files in .gitignore
to make Git shut the bleep up, but other than that, they're quite irrelevant.
When we run git merge
, we quite typically are doing a three-way merge, which can have conflicts. To understand what's happening, let's look at a sample commit graph, i.e., a set of commits as found in some Git repository. Because the hash IDs of real commits are incomprehensible, we'll use single uppercase letters to stand in for them, like this:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
I've added two branch names, branch1
—which we currently have checked out, i.e., we're using commit J
to fill Git's index and our working tree—and branch2
, which selects commit L
. The (HEAD)
notation shows that we have branch1
checked out. All six listed commits are ordinary single-parent commits, so that as viewed from commit J
—i.e., git log
if we were to run it right now—we see, as history, commit J
first, then commit I
, then commit H
, then commit G
, and so on. As viewed from commit L
—if we run git log branch2
—we see commit L
first, then K
, then H
, then G
, and so on as before.
These two commit histories meet up, when we go backwards like this, at commit H
. So commit H
is the merge base in this three-way merge.
The goal of a merge is to combine work. We'd like to have Git figure out, on its own, what we changed since commit H
. These are "our changes". We'd like to have Git figure out what they changed since commit H
. These are "their changes". Git can in fact do this, using git diff
:
git diff --find-renames <hash-of-H> <hash-of-J>
This will produce a list of each file we changed, and what lines need to be deleted and added to each of those files to turn the copies of those files that exist in commit H
into the copies of those same files that exist in J
.
Similarly:
git diff --find-renames <hash-of-H> <hash-of-L>
will produce a list of files they changed, and what lines need to be modified in those files.
If Git simply (simply?) combines these two lists and applies both sets of changes to the files taken from commit H
, Git will arrive at a set of files that keeps our changes (H
-to-J
) and adds their changes (H
-to-L
). In many cases, some file we changed will have no changes on their side, and vice versa. These will be easy for Git. In some cases, some files will have changes on both sides. If those changes touch different lines, Git may be able to combine those changes on its own.
These are the rules that Git uses, anyway. It just:
H
: these go into the slot-1 entries.J
: these go into the slot-2 entries. Of course they were already there in slot 0, so no extracting is needed; Git can just move the slot-0 entries to slot-2. (When using git cherry-pick -n
or similar, Git really does need to just move slot entries, because these cases don't require that the index match anything. But that's a special case that git merge
does not normally allow.)L
: these go into the slot-3 entries.The index now has three copies of each file, from merge base commit (BASE
), --ours
commit (LOCAL
), and theirs (REMOTE
). Each of these is really just a hash ID, for an internal Git blob object (well, plus the name and mode, with the staging number representing the slot).1
Because of the de-duplication trick, if nobody made any changes to the file, all three staging slots will hold the same hash ID (and mode) and Git can just collapse all three index entries back down to a single slot-zero entry. If we changed the file, but they didn't, the base and their slot will have the same hash ID (and mode) and ours will differ and Git will just take our version of the file, moving slot 2 to slot zero and erasing slots 1 and 3. If they changed the file and we didn't, the base and our slot will have the same hash ID and theirs will differ and Git will just take their version of the file, moving slot 3 to slot zero, etc.
This means we only ever have to work hard for files where both sides made changes (well, or for high-level / tree conflicts, which I'll skip over here). In this case, the various merge strategies that Git has today work by:
The built-in low-level merge driver works on a line-by-line basis, using git diff
on the individual files.2 For each diff-hunk you'd see in git diff
output, it looks to see if the other side has touched the same lines, or lines that "touch" another change (e.g., if "our" diff adds a line at the end and "their" diff also adds a line at the end, Git has no idea which order to use when adding both sets of lines).3 It writes, to our working tree copy of the file in question, Git's best guess at the correct merge. If this all goes well—if Git is able to combine the two sets of changes without conflicts—Git then does an internal git add
on the file. If not, Git leaves the conflicts in the working tree copy of the file, complete with conflict markers, and doesn't do an internal git add
on the file.
When the low level driver encounters something that is considered a conflict, if there is an extended-argument -X ours
or -X theirs
in effect, it will just take our change (from 1-vs-2) or their change (1-vs-3) according to the -X
value, and not put in any conflict markers. So low-level conflicts can be resolved automatically in software using these flags. Note, though, that Git doesn't do anything smart here. It just picks the 1-vs-2 file diff, or the 1-vs-3 file diff, on the basis of a line-by-line diff hunk. But this does let Git run an internal git add
on its own.
When Git does run an internal git add
, this simply takes the working tree copy of the file and copies it into slot zero, erasing slots 1 through 3 for that file. That marks the file as resolved. The index shrinks back to normal, for that one set of file entries. After all files have been processed, either there are some conflicts still showing in Git's index (because some file didn't get pre-collapsed and did not get git add
-ed), or there aren't (all files got an easy index collapse, or got git add
-ed after the low level driver did its thing).
1The design here was supposed to allow more than one slot-1 entry when doing recursive merge, but that never went anywhere. It's not clear if it could go anywhere as there are some very tricky corner cases with files that don't exist in one or two of the three commits, and they get trickier if you allow this kind of thing.
2There is, in the existing merge-recursive algorithm, a bunch of redundant work in both the high and low level code. The ongoing work to put in a new improved merge is eliminating a lot of this and will speed up a lot of the more difficult merges. This doesn't change the goal of the merge code, nor the high level description I'm giving here, but does shuffle the point at which some bits of work are done and results saved, or not saved, so that they can be done once instead of repeatedly.
3A low level union merge, which Git doesn't support directly—but which you can get with git merge-file
, used as a low-level merge driver that you write—assumes that line order is irrelevant, and can handle this without calling it a conflict.
The description of what merge does with Git's index is pretty long, but if you've followed the logic all the way through, you will see that:
.gitattributes
) or default built-in low-level file merger was able to resolve on its own—perhaps using -X ours
or -X theirs
—is also at stage zero.So merge conflicts remain if and only if there are any nonzero stage numbers in Git's index. In this case, git merge
stops, leaving behind a bunch of internal files—such as .git/MERGE_HEAD
and .git/MERGE_MSG
—to record that there's an ongoing merge. Meanwhile the index itself has some nonzero slot numbers, which record that there is a conflict.
If the conflict was a low-level conflict, and we used Git's built in low-level merge driver on some file, the working tree copy of that file has conflict markers in it. These markers are derived from running the three original input files through the same code that git merge-file
has available (so you could reconstruct the merge conflicts that way, but there's an easier way with git checkout -m
or git restore -m
at this point). Regardless of what's in the working tree copy of the file, the three input files exist in the index.
If we now run git mergetool
, this code rummages through the index (using git ls-files --stage
or equivalent) to find the conflicted files. It then uses git checkout-index
to extract the three files that were the inputs to the low-level merge driver. These get funky .gittemporary
style names, which git mergetool
renames to file_BASE
, file_LOCAL
, and file_REMOTE
respectively (well, the exact naming pattern is tricky and this is just an approximation). For internal purposes, it copies the file
to file_BACKUP
. Then it runs your selected merge tool on these files (excluding the backup one).
Your merge tool now works with working tree files. None of these files are in Git. You do whatever you like to them, using your merge tool. Whatever is in file
, git mergetool
assumes that's the result that you produced through use of the merge tool.
Here, there's one more special trick:
Some merge tools have "trusted" exit codes and some don't.
If your merge tool is marked "trusted" and exits with a status that says the merge is done, use the result, Git will git add
that. This erases the three slots and marks the file resolved.
If your merge took is not trusted, Git will compare the _BACKUP
file with the tool's output. If the file is unchanged, git mergetool
asks you if you think the merge worked. Only if you say yes does it git add
the result.
When git merge
stops in the middle, your job is to clean up the mess, by writing into Git's index, at slot zero, the correct merge result. You can do this any way you like. My preferred method is generally just to open file
in vim
, after Git writes it with merge.conflictStyle
set to diff3
. I find most conflicts easy to resolve this way. In a few cases, I really do want to get the three versions, and for those cases, git mergetool
is a way to do it—but having played with git mergetool
, I haven't found it a particularly good way to do it. This is one of those user-preference deals, though.
Anyway, once you have all the conflicts resolved, and have run git add
to update Git's index, you should run:
git merge --continue
to tell Git to finish the merge. Git does not care how you resolved the conflicts. Git just cares that you put the right file into the index at staging slot zero, clearing out the other three staging slots.
In the bad old days you had to run:
git commit
to finish the merge, and if you'd gotten confused (e.g., got interrupted, had cd
'ed to some other repository, then had a meeting or something, and were now somewhere other than what you were thinking when you ran git commit
) you could make an ordinary commit instead of finishing your merge. The --continue
checks that there is in fact a merge to finish, then runs git commit
to finish it.