gitgit-lfsbfg-repo-cleanergit-filter-repogit-lfs-migrate

Convert files from git to git lfs (and rewrite the history to remove them)


TLDR: I would like to migrate to git lfs (or even just remove) a specific filetype *.fmu from my repo, and rewrite the history. I have tried 3 methods with no success for *.fmu, but limited success for e.g. *.zip.

I'm aware this question has been asked many times and in many guises on this site and others - having followed many of those answers I'm still struggling to remove these specific files.

What I want to do

I have some medium sized *.fmu files tracked in git, which are updated semi-regularly, leading to significant repo bloat. The size of a bare clone is about 4.5GB, while the size of a shallow clone with depth 1 is 1.2GB.

$ git clone --mirror https://address.to.MyRepo
$ du -hd1 MyRepo.git

53K     MyRepo.git/hooks
1.0K    MyRepo.git/info
4.5G    MyRepo.git/objects
0       MyRepo.git/refs
4.5G    MyRepo.git/

I would like to move these files to git lfs, and retrospectively write them out of the repository's history. I can see all of the largest blobs containing them with a handy one-liner from elsewhere on SO:

$ git rev-list --objects --all --missing=print |   git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |   sed -n 's/^blob //p' |   sort --numeric-sort --key=2 |   cut -c 1-12,41- |   $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

...
xxxxxxxxxxxx  6.9MiB example/file1.fmu
xxxxxxxxxxxx  6.9MiB example/file1.fmu
xxxxxxxxxxxx  7.0MiB example/file2.fmu
xxxxxxxxxxxx  7.1MiB example/file2.fmu
xxxxxxxxxxxx  7.1MiB example/file2.fmu
...

What I've tried

In short:

In long:

BFG

Running the bfg command (alias bfg="java -jar bfg.jar") below, the output clearly shows that it's detecting and editing commits from throughout the repo's history.

I then ran the suggested garbage collection (which takes ages), and checked the size of the repo... Which is exactly the same, except now there is also a massive lfs directory. The "largest blobs" one-liner described above also shows all the files still there, with unchanged hashes.

$ bfg --convert-to-git-lfs *.fmu MyRepo.git --no-blob-protection

<Lots of detail here on dirty commits and changed files>

$ cd MyRepo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ du -hd1 .

53K     ./hooks
25K     ./info
4.2G    ./lfs
4.5G    ./objects
80K     ./refs
8.6G    .

Running git lfs prune reduces this somewhat, but the objects directory remains full size.

git-filter-repo

I couldn't find an obvious option for converting to lfs, so I just tried deleting all the *.fmu files on the basis I can add them back later, with lfs this time. It did complain about case sensitivity so I had to adjust the git setting, and I also got a note about some branches which weren't removed, followed by a rather long list.

I used the garbage collector, but there was no decrease in repo size. The "largest blobs" one liner also shows all files still present with same hashes.

$ git config --unset core.ignoreCase
$ git-filter-repo --path-glob '*.fmu' --invert-paths

Note: Some branches outside the refs/remotes/ hierarchy were not removed;

$ git reflog expire --expire=now --all; git gc --prune=now --aggressive
$ du -hd1 .

610K    ./filter-repo
53K     ./hooks
25K     ./info
4.5G    ./objects
8.0K    ./refs
4.5G    .

The strange thing is, I can use this exact same method to remove a *.zip file from the repo with no issue, and watch the size tick down to 4.4G.

git lfs migrate

I ran the suggested command for git lfs migrate, followed by a force prune and the previously suggested garbage collection. Once again, the objects directory doesn't have a dent in it, and there's a hefty new lfs directory. (If I git lfs prune --force again after the git gc the lfs directory shrinks right back down).

$ git lfs migrate import --include '*.fmu' --everything
$ git lfs prune --force
$ git reflog expire --expire=now --all; git gc --prune=now --aggressive
$ du -hd1 .

61K     ./hooks
1.0K    ./info
4.2G    ./lfs
4.5G    ./objects
368K    ./refs
8.7G    .

What am I missing? TIA


Solution

  • I also got a note about some branches which weren't removed, followed by a rather long list.

    This suggests that you might have some extra branches/tags that are holding references to the .fmu files.

    Perhaps

    $ git clone --single-branch --no-tags https://address.to.MyRepo
    

    would be a better starting point. This does not resolve the problem of dealing with the non-HEAD branches, but at least we can get something that can shrink in size.