gitdeltagit-stage

a git defect has caused git to mark moved files as deleted, how to force git to rescan the project to correct this error?


I once refactored a project submodule, due to some hidden tool-calling by AI agents, the staged delta on git is in a strange state:

$ git status
On branch scala-support/dev4
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    new file:   plugins/plugin/build.gradle.kts
    new file:   plugins/plugin/src/main/kotlin/lean4ij/MyBundle.kt
    new file:   plugins/plugin/src/main/kotlin/lean4ij/actions/AddInlayGoalHint.kt
    new file:   plugins/plugin/src/main/kotlin/lean4ij/actions/DelInlayGoalHint.kt
    new file:   plugins/plugin/src/main/kotlin/lean4ij/actions/FindInInternalInfoview.kt
...

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    deleted:    plugin/build.gradle.kts
    deleted:    plugin/src/main/kotlin/lean4ij/MyBundle.kt
    deleted:    plugin/src/main/kotlin/lean4ij/actions/AddInlayGoalHint.kt
    deleted:    plugin/src/main/kotlin/lean4ij/actions/DelInlayGoalHint.kt
    deleted:    plugin/src/main/kotlin/lean4ij/actions/FindInInternalInfoview.kt
...

The content of all the news files are completely identical to old files, they should be marked as moved instead of new file. Unfortunately, the git implementation failed to compress it to a minimal delta and introduced unnecessary bloat to the upcoming commit.

My questions are:


Solution

  • they should be marked as moved instead of new file

    They would be, if the addition and the deletion were both staged (or both unstaged). Currently, only the addition is staged for commit but the deletion is not. (So if you were to commit it, the result would show up as the files being copied – though as explained below, they would not occupy any additional space, no matter the number of such "copies".)

    Unfortunately, the git implementation failed to compress it to a minimal delta and introduced unnecessary bloat to the upcoming commit.

    That isn't a problem, because Git commits are not delta-based.

    Git is a content-addressed-storage system. Each Git commit stores the working tree state in full, but with deduplication done at whole-file granularity: all files with the same content (even across commits) are stored as a single object.

    There is no distinction in the resulting Git commit between 'add' vs 'copy', nor between 'move' vs 'add+delete' – the end result is the same, with any number of tree entries referencing the same object ID (which then occupies no more than 1x space). Hence git copy being nonexistent; the existing git move is merely for convenience as well. Commits don't even store any metadata indicating that 'git move' was used.

    Deltas between commits are reconstructed on the fly as needed. For example, during git status or git log, renames are reconstructed by the UI based on file similarity: a delete + add of the same object (or sufficiently similar objects) will show up as a "move".


    how to use git CLI to to re-run its delta optimisation on this repository?

    Packfile creation (git gc) does use deltas, but purely as a compression mechanism between individual objects – it is not based on commit history, but instead searches for similar objects as candidates for delta compression.

    But this delta packing is done after whole-object deduplication has already happened, so in your case it will have practically no impact: the new commit only contributes 2-3 new small objects (a few 'tree' objects plus the 'commit' metadata object).

    Indeed the fresh commit won't be delta-packed at all, so there is nothing to "re"-run yet – by default Git will only consider automatic packing when it accumulates a few hundred 'loose' objects. You can still force it using git repack -d or -da or git gc if you like.