gitgit-log

Search git history by file content


I have a tarball from a point in time from a multi-year project. The tarball was supposedly from a particular release, and the project tags the commit corresponding to each release. However, the content of the tarball doesn't match that from the git tag.

Usually, I'm navigating git commits and seeing what files looked like in the past. In this case, I've got the file contents, but wondering which commit(s) match that content.

Can I take a sample file from the tarball and search the git history for a file with matching content?

The git log -S<string> option looks promising. Can I feed an entire (500 line) file to it as the search criteria?

Also, git log --find-object=<object-id> offers hope. Can I hash (e.g. sha1sum) a standalone file and search for files within git that have the same content and hence the same hash?


Solution

  • You can compute the SHA1 of a file with the command git hash-object --path.

    git hash-object --path=<file>
    

    The --path option allows you to generate the SHA1 even for files located outside of your working directory.

    [...] This option is mainly useful for hashing temporary files located outside of the working directory or files read from stdin.

    Once you've generated the SHA1, you can supply it to git log --all --find-object=<object-id>, and see which commits are associated to a tree containing the blob identified by the given id.