gitrepositorygit-gc

How can I trigger garbage collection on a Git remote repository?


As we know, we can periodically run git gc to pack objects under .git/objects.

In the case of a remote central Git repository (bare or not), though, after many pushes, there many files under myproj.git/objects; each commit seems to create a new file there.

How can I pack that many files? (I mean the ones on the remote central bare repository, not on local clone repository.)


Solution

  • The remote repo should be configured to run gc as needed after a commit is added. See the documentation of gc.auto in git-gc and git-config man pages.

    However, a remote repo shouldn't need all that much garbage collection, since it will rarely have dangling (unreachable) commits. These usually result from things like branch deletion and rebasing, which typically happen only in local repos.

    So gc is needed more for repacking, which is for saving storage space rather than removing actual garbage. The gc.auto variable is sufficient for taking care of this.


    Update in 2024:

    Garbage in GitHub repos is commonplace nowadays, because the typical PR workflow involves a lot of force-pushing and rebasing.

    However, you can't influence GC in a remote repo, GitHub or not. You can do it only if you have shell access on the remote system that's hosting the repo. In other words, garbage collection has to be a local operation on the machine where the repo is located.

    GitHub does do GC, but it happens invisibly. Sometimes this has unfortunate and surprising consequences when best practices aren't being followed. For example, it's possible to refer to the commit hash of a PR in a source-based dependency in another project, while waiting for a PR to be accepted upstream. If that PR branch is subsequently rebased, the source dependency will continue working for a while, because the dangling commit can still be fetched if its hash is known.

    However, when GH does get around to doing GC on the repo, the other project's build will suddenly break with a "missing reference" error because the commit no longer exists in the dependency's repo. This can be very mystifying, especially if the person who set up the source dependency is no longer around. What's more, it can be extremely hard to figure out what the missing hash was originally referring to, because it's no longer part of any branch.

    If you're very lucky, the commit will still exist in someone's clone of the repo that hasn't been garbage-collected, and the reflogs can be used to find out what branch the commit was originally in. The source dependency can then be updated to use the rebased version of the PR, or can be dropped altogether if the PR has now been merged.

    The moral of the story: be very careful when referring to commits in PRs. They aren't stable and can disappear without warning.