I need to compute SHA-256 hashes in a Java Application. Apaches HmacUtils seem appropriate for this. However, in the documentation it says "This class is immutable and thread-safe. However the Mac may not be". Here "Mac" is referring to the instance returned, which actually calculates the hash.
Now, I want to be able to compute hashes from multiple threads, ideally without them blocking each other. What is the best way to achieve this? Is there a reference which algorithm would actually be thread safe? I understand that it would also need to be reentrant? Should I create one instance of "Mac" per thread to avoid thread safety issues? (Is this sufficient? Is this considerably more expensive?)
Most hash algorithms that we use today operate by processing chunks of data (usually either 512 bits or 1024 bits) in a series. This is true for SHA-2, SHA-3, and non-parallel variants of BLAKE2, which are the most common secure hash algorithms in used today. The same applies to HMAC constructions or built-in MAC constructions based on these algorithms.
As a result, you will typically want to process an instance of these algorithms on a single thread anyway because the algorithms themselves are not parallelizable. Using the typical, non-pure approach to implement hash algorithms (which is the approach almost every cryptographic library, including this one, uses), that means that a MAC or hash instance is mutable and must either be used on a single thread or synchronized across threads. As I mentioned, usually the former is a better idea.
The cost of creating a hash or MAC instance is usually very small, so that is practically not a concern. However, if you're performing many MACs with the same key, it can be somewhat more efficient to create a single instance with the key initialized and then clone it, which saves the cost of hashing the first block for the key. This might be a significant performance improvement if the message is small.
There are algorithms, such as BLAKE3 and MD6, that use tree hashing and these can often benefit from parallelization. Usually they will have documentation telling you how threads are created and handled (often with a thread pool) and the thread-safety requirements involved. These algorithms are often threaded internally and so instances may be safe to share across threads if the documentation states so. However, the library you're using doesn't offer any of those, so it's not a concern you have to deal with.
If you're planning on processing small chunks of data, you'll almost certainly want to use a thread pool, since the overhead of spawning a thread can be non-trivial compared to hashing.