gitversion-controlgit-rebasegit-rewrite-history

How can I remove/delete a large file from the commit history in the Git repository?


I accidentally dropped a DVD-rip into a website project, carelessly git commit -a -m ..., and, zap, the repository was bloated by 2.2 GB. Next time I made some edits, deleted the video file, and committed everything, but the compressed file was still there in the repository, in history.

I know I can start branches from those commits and rebase one branch onto another. But what should I do to merge the two commits, so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?


Solution

  • New answer that works in 2022

    Do not use:

    git filter-branch
    

    This command might not change the remote repository after pushing. If you clone after using it, you will see that nothing has changed and the repository still has a large size. It seems this command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.

    The Solution

    This solution is based on using:

    git filter-repo
    

    Steps:

    (1) Find the largest files in .git (change 10 to whatever number of files you want to display):

    git rev-list --objects --all | grep -f <(git verify-pack -v  .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)
    

    (2) Start filtering these large files by passing the path&name of the file you would like to remove:

     git filter-repo --path-glob '../../src/../..' --invert-paths --force
    

    Or use the extension of the file, e.g., to filter all .zip files:

     git filter-repo --path-glob '*.zip' --invert-paths --force
    

    Or, e.g., to filter all .a library files:

     git filter-repo --path-glob '*.a' --invert-paths --force
    

    or whatever you find in step 1.

    (3)

     git remote add origin git@github.com:.../...git
    

    (4)

    git push --all --force
    
    git push --tags --force
    

    Done!!!