gitgit-plumbing

How does git cat-file -t <object id> determine the type of object?


The following link explains how Git computes object IDs (ruby code snippet). The object type (blob, tree, commit) is encoded in the header which is concatenated with the content and the SHA1 is computed on the concatenated string.

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects

One can use git cat-file -t <object id> to determine the type of the object (blob, tree, commit).

I'm wondering how does this command extract the type from the object ID given that SHA1 is a oneway hashing function?


Solution

  • "You're holding it upside down." 😀

    While it's true that SHA is a one-way hash, that's not a problem: you're supplying the hash yourself, which Git uses as a key in a key-value database, allowing Git to retrieve the data. (If you supply part of the hash, rather than the whole thing, Git looks for keys that match that prefix; if the prefix is unique, Git assumes that the resulting matching key is the right key.)

    Having obtained the data—the zlib-compressed object—Git now needs only to uncompress the first few bytes of that data. These begin with one of the four object type strings: blob, commit, tag, or tree (followed by a space and then the decimal-expansion-in-ASCII of the size and the '\0' byte).

    If Git extracts the entire object—the -t code can take a shortcut and stop decompressing early—Git will then verify that the bytes of the object, including the header, fed back through the hash function, produce the key that was used to retrieve the object. If Git stops short (as it does for -t), Git skips the verification step.