gitgithubgit-commitgit-filter-branchgit-rewrite-history

How to delete all files > 1 Mb from the history (but keep them in the repository)


I have a repository with many big files (psd, exe, pdf, etc.) and every time I commit an update of those files, the git .pack file grow drastically to keep the history. How can I delete all files > 1mb from the history but keep them in the repository.

Also is it possible to setup for a particular file to never be stored in the history?


Solution

  • Using git-filter-repo

    git filter-repo is recommended by the git project over git filter-branch.

    git filter-repo --strip-blobs-bigger-than 1M
    

    Using BFG Repo-Cleaner

    The older BFG Repo-Cleaner used to be the most popular tool to do exactly that.

    To remove all files with a size > 1 MB:

    $ bfg --strip-blobs-bigger-than 1M  my-repo.git
    

    By default it will not touch your current files.

    Don't use git filter-branch

    git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended. Source

    Second question: how to keep specific files from being stored in the history

    You can add files to .gitignore so that they are never added in the first place, but Git cannot be configured to delete them automatically, so you would need some kind of hook that automatically executes bfg or git-filter-repo.

    Better to prevent the problem in the first place

    Tools like bfg are meant for rare exceptions. Ideally, you should prevent large binary files from being included in the repository in the first place. Instead, there are many other ways to preserve them, for example to add them to a GitHub release or upload them to a package repository depending on your environment, such as npm, a Maven repository or GitHub packages.