TLDR: I would like to migrate to git lfs (or even just remove) a specific filetype *.fmu
from my repo, and rewrite the history. I have tried 3 methods with no success for *.fmu
, but limited success for e.g. *.zip
.
I'm aware this question has been asked many times and in many guises on this site and others - having followed many of those answers I'm still struggling to remove these specific files.
I have some medium sized *.fmu
files tracked in git, which are updated semi-regularly, leading to significant repo bloat. The size of a bare clone is about 4.5GB, while the size of a shallow clone with depth 1 is 1.2GB.
$ git clone --mirror https://address.to.MyRepo
$ du -hd1 MyRepo.git
53K MyRepo.git/hooks
1.0K MyRepo.git/info
4.5G MyRepo.git/objects
0 MyRepo.git/refs
4.5G MyRepo.git/
I would like to move these files to git lfs, and retrospectively write them out of the repository's history. I can see all of the largest blobs containing them with a handy one-liner from elsewhere on SO:
$ git rev-list --objects --all --missing=print | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
...
xxxxxxxxxxxx 6.9MiB example/file1.fmu
xxxxxxxxxxxx 6.9MiB example/file1.fmu
xxxxxxxxxxxx 7.0MiB example/file2.fmu
xxxxxxxxxxxx 7.1MiB example/file2.fmu
xxxxxxxxxxxx 7.1MiB example/file2.fmu
...
In short:
In long:
BFG
Running the bfg
command (alias bfg="java -jar bfg.jar"
) below, the output clearly shows that it's detecting and editing commits from throughout the repo's history.
I then ran the suggested garbage collection (which takes ages), and checked the size of the repo... Which is exactly the same, except now there is also a massive lfs directory. The "largest blobs" one-liner described above also shows all the files still there, with unchanged hashes.
$ bfg --convert-to-git-lfs *.fmu MyRepo.git --no-blob-protection
<Lots of detail here on dirty commits and changed files>
$ cd MyRepo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ du -hd1 .
53K ./hooks
25K ./info
4.2G ./lfs
4.5G ./objects
80K ./refs
8.6G .
Running git lfs prune
reduces this somewhat, but the objects directory remains full size.
git-filter-repo
I couldn't find an obvious option for converting to lfs, so I just tried deleting all the *.fmu
files on the basis I can add them back later, with lfs this time. It did complain about case sensitivity so I had to adjust the git setting, and I also got a note about some branches which weren't removed, followed by a rather long list.
I used the garbage collector, but there was no decrease in repo size. The "largest blobs" one liner also shows all files still present with same hashes.
$ git config --unset core.ignoreCase
$ git-filter-repo --path-glob '*.fmu' --invert-paths
Note: Some branches outside the refs/remotes/ hierarchy were not removed;
$ git reflog expire --expire=now --all; git gc --prune=now --aggressive
$ du -hd1 .
610K ./filter-repo
53K ./hooks
25K ./info
4.5G ./objects
8.0K ./refs
4.5G .
The strange thing is, I can use this exact same method to remove a *.zip
file from the repo with no issue, and watch the size tick down to 4.4G.
git lfs migrate
I ran the suggested command for git lfs migrate, followed by a force prune and the previously suggested garbage collection. Once again, the objects directory doesn't have a dent in it, and there's a hefty new lfs directory. (If I git lfs prune --force
again after the git gc
the lfs directory shrinks right back down).
$ git lfs migrate import --include '*.fmu' --everything
$ git lfs prune --force
$ git reflog expire --expire=now --all; git gc --prune=now --aggressive
$ du -hd1 .
61K ./hooks
1.0K ./info
4.2G ./lfs
4.5G ./objects
368K ./refs
8.7G .
What am I missing? TIA
I also got a note about some branches which weren't removed, followed by a rather long list.
This suggests that you might have some extra branches/tags that are holding references to the .fmu
files.
Perhaps
$ git clone --single-branch --no-tags https://address.to.MyRepo
would be a better starting point. This does not resolve the problem of dealing with the non-HEAD branches, but at least we can get something that can shrink in size.