So, someone pushed a large file to a repo in our Bitbucket (we use Bitbucket Server, so it's hosted by us). We have deleted the file but want to get rid in history too as the repo is now quite large to clone.
We can see how to get rid of the large file in a clone of the repo. We have done that using git-filter-repo
.
However, this repo is central to our CI system and we can't move or rename it easily. So, I want to perform the same operation directly on the repo used by the bitbucket server. That is proving tricky. I found where the repo is (thanks to this answer). I logged in to the server and went to $BITBUCKET_HOME/shared/data/repositories/<id>
and tried running the git-filter-repo
command there but it failed with
Parsed 2203 commits
Required environment variable STASH_HOOK_ADDRESS is missing
Required environment variable STASH_HOOK_ADDRESS is missing
fatal: ref updates aborted by hook
fast-import: dumping crash report to fast_import_crash_22581
Error: fast-import failed; see above.
I can't find anything on this error at all. Can anyone help?
I stopped the bitbucket service and tried again. Same response. I started bitbucket and it wouldn't start up. However, since everything is virtual and we had taken a snapshot first, we could roll back without any harm. But it still leaves the original question of how to run git-filter-repo
(or in some other way clean up the history) on the server.
There is an alternative. I can, I think:
git-filter-repo
to remove the fileThis cleans up the repo size (which is my main concern - given our CI process clones this repo so much, having it very bloated will be an issue) and I have the history and branches and tags as far as I can see. What I lose is the settings and the history of pull requests, etc. I'd like to keep those if I can - it's really useful to be able to go to an issue in Jira and click the link to even a closed PR and see from the diff exactly what was done. But if I have to choose between fixing the repo size and keeping old PRs then I'll fix the repo size.
We have fixed this issue by running git-filter-repo in the following order:
fresh clone of the repository
git clone <url_to_the_repo>
checkout branch with a bad file
git checkout feature/bad_file
run git-filter-repo on that branch. We had to run it with --force flag. The git-filter-repo script should be outside of your repository folder
python3 ../git-filter-repo --invert-paths --path-match files/bad_file.zip --force
set back the origin to the git config because it was removed by a script
git remote add origin <url_to_the_repo>
force push to your branch
git push --set-upstream origin feature/bad_file --force
After that bad file was removed from history and the repository size decreased, commit hashes also were changed on that branch.
Better to make a snapshot of the server before doing this operation