gitsquash

Streamlining a repository history by squashing commits to selected checkpoints


Suppose I have a repository with a complex history of many branches, merges between them, divergences, further merges in the opposite direction, rebases, etc. There is however a master branch which is the important one, and some significant checkpoints in it (they are not labelled, but they could be), which where all the branch tip at some point (and haven't been subject to rebasing).

Can I (easily) reduce the whole repository to only those checkpoints, squashing them to single commits? E.g. reduce something like this:

    A--C--F
   /       \ 
--#1--B--E--#2--I--#3--K--#4
       ­\          /
        D--G--H--J

to this

--#1--#2--#3­­--#4

Assuming I don't care about other branches, intermediate commit messages, or keeping the SHAs.

I guess I should work backwards? That is, first squash #4

    A--C--F
   /       \ 
--#1--B--E--#2--I--#3--#4
       ­\          /
        D--G--H--J

then #3

    A--C--F
   /       \ 
--#1--B--E--#2--#3--#4       
        

and finally #2

--#1--#2--#3--#4

But what commands would I use. Any simpler sequence/workflow? Any pitfalls?


Solution

  • Sure, the easiest user-friendly way would be by creating a new orphan branch from the first commit that you would like to keep (that would be the first commit in the new orphan branch) and then continue with the other commits using restore

    git restore --staged --worktree --source second-checkpoint-commit -- .
    git commit -m "second chrckpoint"
    git restore --staged --worktree --source third-checkpoint-commit -- .
    git commit -m "third checkpoint"
    

    And so on

    If you want to do it in a more hackish fashion, you could try creating commits with git commit-tree.

    git commit-tree -m "first checkpoint" first-checkpoint-commit^{tree}
    # that will print a commit ID
    git commit-tree -p commit-id-from-previous-command -m "second checkpoint" second-checkpoint-commit^{tree}
    # that will print a different commit I'd
    git commit-tree -p commit-id-from-previous-command -m "third checkpoint" third-checkpoint-commit^{tree}
    

    The advantage of this approach is that it can be done without having to move around your working tree, you are creating new commits and chaining them without checking them out.