gitgit-stashgit-untracked

Lost tracked files when doing git stash --include-untracked


when I did git status, there are both tracked and untracked files. Early the day, I just learned that git stash --include-untracked would stash the untracked files. It worked for me at that time. So I thought git stash --include-untracked would save both tracked and untracked files' change. But when I git stash apply, there is only untracked files' change left. The tracked files' change are lost.


Solution

  • There's something suspicious here, but it's probably not the stash itself

    git stash --include-untracked, which can be spelled git stash -u for short, makes three commits for the stash.

    The first two are the same two as usual: one to hold whatever was in the index at the time you ran git stash, and the other to hold whatever was in the work-tree—but tracked files only—at that time. In other words, the i commit holding the index holds the result of git write-tree, and the w commit holds the result of (the equivalent of) git add -u && git write-tree (although the stash code does this the hard way, or did in the old days of shell script stash).

    That's all that the stash would have if you ran git stash without --all or --include-untracked: it would have the two commits for i (index state) and w (work-tree state), both of which have the current commit C as their first parent. Commit w has i as its second parent:

    ...--o--o--C   <-- HEAD
               |\
               i-w   <-- stash
    

    If you do add -u or -a, however, you get a three-commit stash: commit w acquires a third parent, a commit we can call u, that holds the untracked files. This third parent has no parent of its own (is an orphan / root-commit), so the drawing is now:

    ...--o--o--C   <-- HEAD
               |\
               i-w   <-- stash
                /
               u
    

    The interesting thing about this new commit, and its effect in the work-tree as well, is this: *Commit u contains only untracked files.**

    Remember that a commit is a full and complete snapshot of all (tracked) files. Commit u is made by—in a temporary index—discarding all tracked files and instead, tracking some or all untracked files. This step either adds only the untracked-but-not-ignored files (git stash -u), or all files (git stash -a). Then Git writes commit u, using git write-tree to turn the temporary index into a tree to put into commit u, so that commit u contains only the selected files.

    Now that these selected files are in commit u, git stash removes them from the work-tree. In practice, it used to just run git clean with appropriate options. The new fancier C-coded git stash still does the equivalent (but, one might hope, with fewer bugs; see below).

    This is similar to what it does for the files in i and/or w: it effectively does a git reset --hard, so that the work-tree's tracked files match the HEAD commit. (That is, it does this unless you use --keep-index, in which case it resets the files to match the i commit.) The git reset at this point has no effect on untracked files, which are outside the scope of git reset, and no effect on the current branch since the reset deliberately keeps that at the HEAD.

    Having stashed some untracked files in commit u, though, git stash then removes those files from the work-tree. That's quite important later (and maybe also immediately).

    Note: there was a bug in combining git stash push with pathspecs, that potentially affects everything, but especially affects the stash variants made with -u or -a, where some versions of Git remove too many files. That is, you might git stash just some subset of your files, but then Git would git reset --hard or git clean all files, or too many files. (I believe these are all fixed today, but in general, I don't recommend using git stash at all, and especially not the fancy pathspec variants. Removing untracked files that weren't actually stashed is particularly egregious behavior, and some versions of Git do that!)

    You describe an apply-time problem, but maybe not the usual one

    Here's what you said:

    I thought git stash --include-untracked would save both tracked and untracked files' change.

    As always, Git doesn't save changes, it saves snapshots.

    But when I 'git stash apply`, there is only untracked files' change left. The tracked files' change are lost.

    Applying a normal (no-untracked-files) stash is done in one of two ways, depending on whether you use the --index flag. The variant without --index is easier to explain, since it literally just ignores the i commit. (The variant with the --index flag first uses git apply --index on a diff, and if that fails, suggests that you try without --index. If you want the effect of --index, this is terrible advice and you should ignore it. For this answer, though, let's ignore the --index option entirely.)

    Note: this is not the --keep-index flag, but rather the --index flag. The --keep-index flag applies only when creating a stash. The --index flag applies when applying a stash.

    To apply the w commit, Git runs git merge-recursive directly. This is not something you should ever do as a user, and when git stash does it, that's not really all that wise either, but that's what it does. The effect is a lot like running git merge, except that if you have uncommitted changes in your index and/or work-tree, it may become impossible to return to this state in any sort of automated way.

    If you start with a "clean" index and work-tree, though—that is, if git status says nothing to commit, working tree clean—this merge operation is almost exactly the same as a regular git merge or git cherry-pick, in many ways. (Note that both git merge and git cherry-pick require that things be clean, at least by default.) The merge operation runs with the merge base set to the parent of commit w, the current or --ours commit being the current commit as usual, and the other or --theirs commit being commit w.

    That is, suppose that your commit graph now looks like this:

           o--o--A--B   <-- branch (HEAD)
          /
    ...--o--o--C
               |\
               i-w   <-- stash
                /
               u
    

    so that you are on commit B. The merge operation to apply the stash does a three-way merge with C as the merge base and w as the --theirs commit, and the current commit/work-tree as the --ours commit. Git diffs C vs B to see what we changed, and C vs w to see what they changed, and combines the two sets of differences.

    This is how the merge into B will run, provided that Git can first un-stash commit u. The usual problem at this point is that Git can't un-stash u.

    Remember that commit u contains exactly (and only) the untracked files that were present when you made the stash, and that Git then removed with git clean (and appropriate options). These files must still be absent from the work-tree. If they are not absent, git stash apply will be unable to extract the files from u and will not proceed.

    Since the untracked files are untracked, it's hard to know if they changed

    But when I 'git stash apply`, there is only untracked files' change left. The tracked files' change are lost.

    You talk about changes in untracked files.

    Git of course doesn't store changes, so you can't find them that way. And if the files are untracked, they're not in the index right now either. So: how do you know they're changed? You need some other set of files to which to compare them.

    The step that extracts commit u is supposed to be all-or-nothing: it should either extract all u files, or not. If it does extract all u files, git stash apply should go on to attempt to merge, somewhat as if by git cherry-pick -n (except that cherry-pick writes to the index too), commit w in the stash. That should leave you with extracted u files and merged w-vs-C changes, in your work-tree.

    If there are conflicts between C-vs-work-tree vs C-vs-w, you should have the conflict markers present in the work-tree, and your index should have been expanded as usual for a conflicted merge.

    If you can make a reproducer for your problem, that would probably provide huge amounts of clarity here.