gittarchecksumsha1sum

How does git know if a tarball has changed?


If a tarball (a .tgz file) is tracked in a Git repo, how does Git know if it has changed between commits?

I am looking to copy that behavior/functionality, so I can determine if there are changes between two different tarballs.

Again, what am I trying to do? I want to create a script that can diff tarballs, without having to use git


Solution

  • Git knows if a tar file has changed the same way it detects if other files have changed: it compares the contents of the file. This may be as naïve as comparing them byte by byte or by computing a hash of the file first and then comparing the hashes. Since Git internally stores all known files with their hash, this can be used instead of doing the expensive byte-by-byte comparison.

    To make use of the functionality, you could simply use Git itself to compare any two files on your filee system:

    git diff --no-index file1.tgz file2.tgz
    

    Or, if you don't have Git available, you could use the plain diff command instead.

    Another option would be to manually compute checksums of the two files and compare the checksums instead. If the checksums are different, then the files are guaranteed to be different. If the checksums are identical, it is very likely that the file contents are also identical, but there's still the probability of hash collisions, so to be certain, you'd then have to compare the files byte-by-byte.

    A simple way to compute and compare checksums of two files would be the following:

    test "$(sha1sum <file1)" = "$(sha1sum <file2)"
    

    Note the IO redirect, so that the output is the same even if the files have different file names.

    You can of course use any other hashing algorithm such as sha256sum