gitgarbage-collection

Understanding git gc --auto


I'm experimenting with fairly aggressive auto gc in Git, mainly for packing purposes. In my repos if I do git config --list I have setup

...
gc.auto=250
gc.autopacklimit=30
...

If I do git count-objects -v I get

count: 376
size: 1251
in-pack: 2776
packs: 1
size-pack: 2697
prune-packable: 0
garbage: 0

But git gc --auto doesn't change these figures, nothing is being packed! shouldn't the loose objects get packed since I'm 126 objects over the gc.auto limit?


Solution

  • One of the main points of gc --auto is that it should be very quick, so other commands can frequently call it “just in case”. To achieve that, the object count is only guessed. As git help config says under gc.auto:

    When there are approximately more than this many loose objects in the repository […]

    Looking at the code (too_many_loose_objects() in buildin/gc.c), here’s what happens:

    1. The gc.auto is divided by 256 and rounded up
    2. The folder that contains all the objects that start with 17 is opened
    3. It is checked if the folder contains more objects than the result of step 1

    This works fine, since SHA-1 is evenly distributed, so “all the objects that start with X” is representative for the whole set. But of course this only works for a big big amount of objects. To lazy to do the maths, I would guess at least >3000. With 6700 (the default value of gc.auto), this should already work quite reliably.

    The core question for me is why you need such a low setting and whether it is important that this really runs at 250 objects. With a setting of 250, gc will run as soon as you have 2 loose objects that start with 17. The chance that this happens is > 80% for 600 objects and > 90% for 800 objects.

    Update: Couldn’t help it – had to do the math :). I was wondering how well that estimation system would work. Here’s a plot of the results. For any given gc.auto, how high is the probability that gc will start when there are gc.auto (red) / gc.auto * 1.1 (green) / gc.auto * 1.2 (orange) / gc.auto * 1.5 (blue) / gc.auto * 2 (purple) loose objects in the repo?

    Plot of the results