gitbitbucketbitbucket-servergit-filter-repo

How to Remove a Large File from the History of a Bitbucket Server Repo In-Place


So, someone pushed a large file to a repo in our Bitbucket (we use Bitbucket Server, so it's hosted by us). We have deleted the file but want to get rid in history too as the repo is now quite large to clone.

We can see how to get rid of the large file in a clone of the repo. We have done that using git-filter-repo.

However, this repo is central to our CI system and we can't move or rename it easily. So, I want to perform the same operation directly on the repo used by the bitbucket server. That is proving tricky. I found where the repo is (thanks to this answer). I logged in to the server and went to $BITBUCKET_HOME/shared/data/repositories/<id> and tried running the git-filter-repo command there but it failed with

Parsed 2203 commits
Required environment variable STASH_HOOK_ADDRESS is missing
Required environment variable STASH_HOOK_ADDRESS is missing
fatal: ref updates aborted by hook
fast-import: dumping crash report to fast_import_crash_22581
Error: fast-import failed; see above.

I can't find anything on this error at all. Can anyone help?

I stopped the bitbucket service and tried again. Same response. I started bitbucket and it wouldn't start up. However, since everything is virtual and we had taken a snapshot first, we could roll back without any harm. But it still leaves the original question of how to run git-filter-repo (or in some other way clean up the history) on the server.

There is an alternative. I can, I think:

This cleans up the repo size (which is my main concern - given our CI process clones this repo so much, having it very bloated will be an issue) and I have the history and branches and tags as far as I can see. What I lose is the settings and the history of pull requests, etc. I'd like to keep those if I can - it's really useful to be able to go to an issue in Jira and click the link to even a closed PR and see from the diff exactly what was done. But if I have to choose between fixing the repo size and keeping old PRs then I'll fix the repo size.


Solution

  • We have fixed this issue by running git-filter-repo in the following order:

    1. fresh clone of the repository

      git clone <url_to_the_repo>

    2. checkout branch with a bad file

      git checkout feature/bad_file

    3. run git-filter-repo on that branch. We had to run it with --force flag. The git-filter-repo script should be outside of your repository folder

      python3 ../git-filter-repo --invert-paths --path-match files/bad_file.zip --force

    4. set back the origin to the git config because it was removed by a script

      git remote add origin <url_to_the_repo>

    5. force push to your branch

      git push --set-upstream origin feature/bad_file --force

    After that bad file was removed from history and the repository size decreased, commit hashes also were changed on that branch.

    Better to make a snapshot of the server before doing this operation