When trying to understand the ways to undo various git operations I came up with a scenario where I'm not sure how to deal with it. Disclaimer: I did not have this situation when actually working with git 'in production' but I'd still think it's not only an academic question.
echo "some content" >> example.txt
git add example.txt
git checkout @ -- example.txt
"some content"
) backEvery time when staging changes with git add
a blob object is created under .git/objects/ and the index file (.git/index) gets updated. If I change and add things multiple times there will be multiple blobs. The old ones aren't immediatly garbage collected.
When running the checkout command from above the index gets update immediately (also I would have assumed that the content would only be in my working directory but unstaged). This way the reference is gone and I cannot use things like git checkout-index
to revert them.
Unless garbage collection kicks in the content is still there technically. But I don't know how I would get it back other then manually trying to find the hash somehow and reading the content with git cat-file
. The same would e.g. be true for running git add
multiple times although here wanting back the previously staged changes maybe isn't really a usecase. (Or maybe when popping changes from stash? ...)
So all of this boils down to these questions:
git reflog
for the index?git checkout @ -- file
considered to be a dangerous command like git reset --hard
where you could potentially loose your work?And if the answers are "No" / "Yes" (what I assume so far):
Bonus: Is there an alternative way to checkout a single file without instantaneously staging it?
Your under-the-hood description is mostly right. The only things that aren't 100% have to do with this part:
Every time when staging changes with
git add
a blob object is created under .git/objects/
Internally, git add
hashes the content of the data in the work-tree file, a la git hash-object -w -t blob
. This doesn't necessarily create a new object: if the hashed content is already in the repository, it just re-uses the existing object. The existing object might be packed, i.e., in .git/objects/pack
, rather than loose as a separate blob.
Moreover, the content written into a blob object might be arbitrarily different from the content in the work-tree due to a clean filter. It is, more often, CR-LF-line-ending-different from the content in the work-tree due to line-ending settings. Clean filters and end-of-line settings are controlled partly (or mostly, depending on your usage of Git) through your .gitattributes
file, and partly (or mostly) through settings in your configuration.
In any case what matters is that you get a hash ID for a blob object. The blob object definitely exists somewhere—in the .git/objects
directory as a loose object, or in a pack file. Now git add
can write into .git/index
(or whatever other file GIT_INDEX_FILE
indicates): it will store, in the index at staging slot zero, an entry for the given path
, using the computed blob-hash and mode 100644
or 100755
depending on whether the work-tree file should be marked executable later.
[Scenario snipped, but it ends with git checkout HEAD -- path
clobbering the index entry, with its $path
represents $blobhash
and mode $mode
information, and clobbering the work-tree copy of the file in path
.)
Unless garbage collection kicks in the content is still there technically. But I don't know how I would get it back other then manually trying to find the hash somehow and reading the content with
git cat-file
.
Indeed, you can't: the hash ID computation is a trapdoor function, and only if you do have the hash can you have Git spill out the content, but you need to have the content if you don't have the hash. That's your Catch-22 situation.
If—this is a pretty important "if"—the content was unique, so that git add
really did create a new blob object, and you've just overwritten the blob reference that was in the index, that blob object is indeed no longer referenced anywhere. On the other hand, if git hash-object -w
wound up reusing some existing blob, the blob object is still referenced by whatever referenced it before. So there are now two interesting cases: the blob was unique and is now eligible for garbage collection, or, the blob was not unique and is not.
Using git fsck --lost-found
or git fsck --unreachable
or git fsck --dangling
(the default), you can have Git walk the entire object database, determine which objects are reachable and which are not, and tell you about some or all of the unreachable ones and/or copy information from or about them into .git/lost-found
. If the blob object was unreachable, it will be listed as one of these unreachable or dangling blobs, or have its contents restored into .git/lost-found
.
The drawback here is that there may be dozens or even hundreds of dangling blob objects. Your task has now switched from "guess the hash" (virtually impossible) to "find the needle in the haystack" (not as difficult, but tedious, and you might well find the wrong needle—it's not really a haystack, it's a stack of needles, after all). And, of course, this only works for the "blob was unique" case.
(This, by the way, is where this question isn't really a duplicate of Can git undo a checkout of unstaged files. But that one is still useful, so see it too.)
Is there something like
git reflog
for the index?
No. You can make your own backup copies: just cp .git/index
somewhere. But Git doesn't do this on its own. You might make one just before a git checkout HEAD -- path
operation, through some alias or shell-function that you use to do this sort-of-dangerous operation.
Note that Git is not aware of these backup copies, so git gc
won't consider referenced objects protected. To use the backups with plumbing commands like git ls-files
, put the path name into GIT_INDEX_FILE
for the duration of that command.
Is
git checkout @ --
file considered to be a dangerous command likegit reset --hard
where you could potentially lose your work?
The answer to that depends on who is doing the considering. I would recommend considering it dangerous myself, since you're asking the question at all. :-)
Are there plumbing commands to manually change/rewrite the index? (see the case above where the objects are still there)
Yes: git update-index
is the one-entry-at-a-time updater (use --cacheinfo
or --stdin
to provide raw index-entry data rather than having it duplicate a lot of git add
work). Many other commands update the index partially or en-masse as well.
If you have a process by which you back up the index before a git checkout HEAD -- ...
operation, you can read the entries out of the backup index (using GIT_INDEX_FILE=... git ls-files
for instance) and then use git update-index
, without having GIT_INDEX_FILE
set, to put the information into the regular index. Of course, this being an index-overwrite-y operation, you might wish to first make another backup of the index.
Is there an alternative way to checkout a single file without instantaneously staging it?
No, but only because of the verb checkout here. To view the contents of a file that is in either the index, or in any commit—so that the contents have a name that git rev-parse
can understand—use git show
:
git show :file # file in index at stage zero
git show :3:file # file in index at stage three, during merge conflict
git show HEAD:file # file in current commit
git show master~7:file # file in commit 7 first-parent hops back from master
Note also that git reset
can overwrite one or many files in the index without touching the files in the work-tree:
git reset HEAD -- file # copy HEAD:file to :file leaving work-tree file undisturbed
If you give git reset
the path to a directory, it resets all the files that are already in the index and reside within the directory.