gitgit-rewrite-historybfg-repo-cleaner

Delete list of files with BFG-repo-cleaner


We are trying to shrink our git repository to under 500MB due to deployment issues.

To achieve that, we have created a new branch where we have moved all old images, videos and fonts to AWS S3.

I can easily get the list of files with git diff --name-only --diff-filter=D master -- public/assets/.

Now, I have tried to run BFG-repo-cleaner 1.14.0 on each file. But I have 400 files and it is taking ages to delete each files separately (still running as I'm writing this).

git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | xargs -i bfg --delete-files '{}'

Since each file is distinct, I can not really use a glob pattern, as suggested at Delete multiple files from multiple branch using bfg repo cleaner.

I tried to separate each file with a comma but that resulted in BFG-repo-cleaner telling me:

BFG aborting: No refs to update - no dirty commits found??

Is there a way to provide multiple files to BFG-repo-cleaner without a glob pattern?

PS. The command I tried with multiple files is: git diff --name-only --diff-filter=D master -- public/assets/ | xargs -i basename '{}' | sed -z 's/\n/,/g;s/,$/\n/' | xargs -i bfg --delete-files '{}' && git reflog expire --expire=now --all && git gc --prune=now --aggressive

PPS. The bfg command is on my PATH as a simple bash script with java -jar /tools/BFG-repo-cleaner/bfg-1.14.0.jar "$@"


Solution

  • But I have 400 files and it is taking ages to delete each files separately

    That is why the tool to use (python-based) is newren/git-filter-repo (see installation)

    That way, you can feed that tool a file, with the list of files in it:

    git filter-repo --paths-from-file <filename> --invert-paths
    

    From the documentation:

    Similarly, you could use --paths-from-file to delete many files.

    For example, you could run git filter-repo --analyze to get reports, look in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt, and then run:

    git filter-repo --invert-paths \
                    --paths-from-file /tmp/files-i-dont-want-anymore.txt
    

    to delete them all.