gitgit-lfsgit-annex

Alternate hashing function based on file extension


I'd like to use git to track Media files as long as associated playlists. Tracking playlists is easy, cause these are text files. About the binary files, I've already taken a look at git-lfs and git-annex, but would want to explore the following way:

Flac files provide an internal md5 hash. Such hash may be accessed through

metaflac --show-md5sum filename.flac

With performance in mind, I'd like to ask git to use "flac md5 hash", not the git internal hash.

How is it possible to do such thing ?

I've read the gitattributes documentation but did not find the answer.

PS: 1st goal is to get lightning fast performance. 2nd goal is that any metadata change to a file would be ignored.


Solution

  • There is no way to use a custom hash function to identify objects in Git. There is ongoing work to switch to SHA-256, but it is not a general-purpose framework for substituting your own hash function.

    CPU usage in Git is not dominated by hashing; it's dominated by compression. Using a different hash function, even if it were possible, would not produce significant performance benefits. (I've run the numbers myself, as have other Git contributors.)

    In addition, MD5 is extremely weak (even weaker than SHA-1) and it shouldn't be used for any purpose whatever nowadays. If you need a fast hash, BLAKE2b is faster than MD5, actually secure, and can be adjusted to an arbitrary length.