I'm experimenting with fairly aggressive auto gc in Git, mainly for packing purposes. In my repos if I do git config --list
I have setup
...
gc.auto=250
gc.autopacklimit=30
...
If I do git count-objects -v
I get
count: 376
size: 1251
in-pack: 2776
packs: 1
size-pack: 2697
prune-packable: 0
garbage: 0
But git gc --auto
doesn't change these figures, nothing is being packed! shouldn't the loose objects get packed since I'm 126 objects over the gc.auto limit?
One of the main points of gc --auto
is that it should be very quick, so other commands can frequently call it “just in case”. To achieve that, the object count is only guessed. As git help config
says under gc.auto
:
When there are approximately more than this many loose objects in the repository […]
Looking at the code (too_many_loose_objects()
in buildin/gc.c
), here’s what happens:
17
is openedThis works fine, since SHA-1 is evenly distributed, so “all the objects that start with X” is representative for the whole set. But of course this only works for a big big amount of objects. To lazy to do the maths, I would guess at least >3000. With 6700 (the default value of gc.auto
), this should already work quite reliably.
The core question for me is why you need such a low setting and whether it is important that this really runs at 250 objects. With a setting of 250, gc
will run as soon as you have 2 loose objects that start with 17
. The chance that this happens is > 80%
for 600 objects and > 90%
for 800 objects.
Update: Couldn’t help it – had to do the math :). I was wondering how well that estimation system would work. Here’s a plot of the results. For any given gc.auto
, how high is the probability that gc
will start when there are gc.auto
(red) / gc.auto * 1.1
(green) / gc.auto * 1.2
(orange) / gc.auto * 1.5
(blue) / gc.auto * 2
(purple) loose objects in the repo?