gitblobbfg-repo-cleanergit-filter-repo

Should I use git-filter-repo on the repo clone or on the self hosted bare repo?


There's a self hosted git repository on a Windows Server (Bonobo based if anyone interested). The repository got bloated up because of binary blobs and I'd like to strip out these large blobs along with their whole history.

I was looked at bfg / git filter-branch, bfg-ish, and git filter-repo. My question I think is invariant of these however it sounds like git filter-repo is the most advised.

The big question: should I execute the --strip-blobs-bigger-than 4M on the repository clone (working copy), or should I go straight ahead and manipulate the hosted bare repo what the Bobono manages? If I execute it on the client clone than how will the changes propagate into Bonobo? These changes will be pretty fundamental, will they be even committable?

I already backed up everything, did some filter-repo analysis. I included the blobs in gitignore (although their modification still show as a change).


Solution

  • I ended up operating on the hosted bare repository. It looks like filter-repo is intended to be used on a clean clone of a repository:

    git filter-repo --strip-blobs-bigger-than 4M
    Aborting: Refusing to destructively overwrite repo history since
    this does not look like a fresh clone.
      (expected freshly packed repo)
    Please operate on a fresh clone instead.  If you want to proceed
    anyway, use --force.
    

    So I retried on a clean clone and the instruction ran, but then I was clueless what to do next. There were no file changes per se to commit or push, the "meta data" was modified. The operation also interestingly stripped [remote "origin"] and [branch "master"] from the .git/config so I needed to re-establish remote and branch.

    So I decided to just go ahead and modify the hosted bare repo. The tool recognizes that it is not a clean clone:

    warning: no corresponding .pack: ./objects/pack/pack-f8fc2556f0b95c1a66219fe3ad3fe41d6319a985.idx
    Aborting: Refusing to destructively overwrite repo history since
    this does not look like a fresh clone.
      (expected freshly packed repo)
    Please operate on a fresh clone instead.  If you want to proceed
    anyway, use --force.
    

    With forcing the meta data size decreased from 1.3GB to 150MB, similarly as it was executed on the clean clone meta data.

    > git filter-repo --force --strip-blobs-bigger-than 4M
    Processed 19965 blob sizes
    Parsed 3536 commits
    New history written in 1.44 seconds; now repacking/cleaning...
    Repacking your repo and cleaning out old unneeded objects
    Enumerating objects: 42458, done.
    Counting objects: 100% (42458/42458), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (12993/12993), done.
    Writing objects: 100% (42458/42458), done.
    Selecting bitmap commits: 3257, done.
    Building bitmaps: 100% (137/137), done.
    Total 42458 (delta 33284), reused 37896 (delta 29067), pack-reused 0
    Removing duplicate objects: 100% (256/256), done.
    Completely finished after 10.20 seconds.
    

    This happens to be a Windows environment, I started off of a clean clone after that, and I had to re-trust the repository in Visual Studio and all that. So far I could push some changes and I'll report back if anything seems to not work.

    It's another story if you are dealing with a repository managed by GitHub or other git services, in this case you won't have direct access to the bare repository they manage. Not sure what happens in that case. I guess you can push the meta data change somehow? Someone should comment.