There's a self hosted git repository on a Windows Server (Bonobo based if anyone interested). The repository got bloated up because of binary blobs and I'd like to strip out these large blobs along with their whole history.
I was looked at bfg
/ git filter-branch
, bfg-ish
, and git filter-repo
. My question I think is invariant of these however it sounds like git filter-repo
is the most advised.
The big question: should I execute the --strip-blobs-bigger-than 4M
on the repository clone (working copy), or should I go straight ahead and manipulate the hosted bare repo what the Bobono manages? If I execute it on the client clone than how will the changes propagate into Bonobo? These changes will be pretty fundamental, will they be even committable?
I already backed up everything, did some filter-repo analysis. I included the blobs in gitignore (although their modification still show as a change).
I ended up operating on the hosted bare repository. It looks like filter-repo is intended to be used on a clean clone of a repository:
git filter-repo --strip-blobs-bigger-than 4M
Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
(expected freshly packed repo)
Please operate on a fresh clone instead. If you want to proceed
anyway, use --force.
So I retried on a clean clone and the instruction ran, but then I was clueless what to do next. There were no file changes per se to commit or push, the "meta data" was modified. The operation also interestingly stripped [remote "origin"]
and [branch "master"]
from the .git/config
so I needed to re-establish remote and branch.
So I decided to just go ahead and modify the hosted bare repo. The tool recognizes that it is not a clean clone:
warning: no corresponding .pack: ./objects/pack/pack-f8fc2556f0b95c1a66219fe3ad3fe41d6319a985.idx
Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
(expected freshly packed repo)
Please operate on a fresh clone instead. If you want to proceed
anyway, use --force.
With forcing the meta data size decreased from 1.3GB to 150MB, similarly as it was executed on the clean clone meta data.
> git filter-repo --force --strip-blobs-bigger-than 4M
Processed 19965 blob sizes
Parsed 3536 commits
New history written in 1.44 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Enumerating objects: 42458, done.
Counting objects: 100% (42458/42458), done.
Delta compression using up to 8 threads
Compressing objects: 100% (12993/12993), done.
Writing objects: 100% (42458/42458), done.
Selecting bitmap commits: 3257, done.
Building bitmaps: 100% (137/137), done.
Total 42458 (delta 33284), reused 37896 (delta 29067), pack-reused 0
Removing duplicate objects: 100% (256/256), done.
Completely finished after 10.20 seconds.
This happens to be a Windows environment, I started off of a clean clone after that, and I had to re-trust the repository in Visual Studio and all that. So far I could push some changes and I'll report back if anything seems to not work.
It's another story if you are dealing with a repository managed by GitHub or other git services, in this case you won't have direct access to the bare repository they manage. Not sure what happens in that case. I guess you can push the meta data change somehow? Someone should comment.