The git verify-pack command has a -v
option which outputs a lot of diagnostic information for each object found in the packfile. However, the value returned by the size
field for a deltified object is not matching my hazy expectations - I thought that it would be something like the uncompressed 'true' size of the Git object? What's the actual meaning of this field?
Specifically, I have a Git packfile which contains a large object:
$ git cat-file -s 7daa9e75f86aa168748aef6c16c76b2acee1acca
61464170
(ie the object size is about 58MB, which is indeed what I see when I check the file out)
However, the line returned for this object by git verify-pack -v
is this:
7daa9e75f86aa168748aef6c16c76b2acee1acca blob 568352 529608 770759074 1 27e47895a3822906eb31b05fe674ad470296c12e
(a full copy of the verify-pack output is available here)
As you can see (after reading the documentation for git verify-pack
), this object is stored deltafied, and the definition of the columns is this:
SHA1 type size size-in-packfile offset-in-packfile depth base-SHA1
So 'size' for this object is 568352 (and 'size-in-packfile' is 529608) - but what does that mean, given that the actual object size is 61464170 bytes? The magnitude-order difference in size must mean that the size figure refers just to the delta?
First, see this diagram. Then, based on the source (builtin/index-pack.c
), the value in the fourth field is:
(unsigned long)(obj[1].idx.offset - obj->idx.offset)
which is the raw packed-up size (obj[1]
is the next object after this one, or the trailer). As the stored item is deltified, that's the size of the delta-compressed data plus overhead. The value in the third field is obj->size
(the first size value from the overhead area).
(To get the actual data, or even its size, you have to inflate the stream a bit and then look at the delta headers. The object's "true" size is encoded in the header as the second size value. See get_size_from_delta
in sha1_file.c
, get_delta_hdr_size
in delta.h
, and the "offset encoding" in the diagram.)
Edit to add: OK, re-reading the question, you're asking more about why the fourth size is so much smaller than the third one. That would be because the third one is the inflated (but not de-delta-ed) size of the object. So: size-in-packfile (field 4) is after deflating, but also includes a bit of header overhead; size of delta-compressed file (field 3) is, well, obvious; and size of ultimate file, after undoing delta compression, is in the header whose byte count is included with the size-in-packfile (field 4).
Extra edit: the offset-in-packfile (field 5) is obj->idx.offset
. That's where you have to lseek()
in the pack file to start reading the object (I think, I've got some confusing code in front of me for handling OBJ_OFS_DELTA
too :-) ).