I compute a hash using init-update-final mechanism i.e. initialize the hash context, followed by hash update with various sizes of input data, and a final digest calculation. This can be done using the EVP_DigestInit, EVP_DigestUpdate and EVP_DigestFinal_Ex functions of OpenSSL or the Update and Final methods of Crypto++.
From what I can dig up from the source code of OpenSSL and Crypto++, it seems like the input data actually get copied and stored. Can anyone confirm this? Is this a limitation of the hashing process or is it a standard procedure? And does this vary with the digest algorithm used?
I was under the impression that the hashing algorithm would compute some internal state based on the chunk of input data, and the final digest will be computed based on the internal states collected over all the various update calls. This does not seem to be the case. Obviously I do not understand the mechanics of hashing algorithms enough.
From what I can dig up from the source code of OpenSSL and Crypto++, it seems like the input data actually get copied and stored. Can anyone confirm this?
Yes and no. The input is not usually stored. Partial inputs are buffered until a full block is available to process.
The buffering is part of the state of the hash. Once consumed, the storage for the partial block may be used if additional partial blocks are encountered. When the hash object is destroyed, the data in the partial blocks get wiped or zeroized.
How much data can potentially be stored depends on the hash's internals and block size. This statement is generally true for iterative hashes like MD5, SHA1 and SHA512. But I don't know about recursive hash functions like SHA3.
The strategy applies to both OpenSSL and Crypto++.
In the case of Crypto++, a hash's output will be buffered internally if no AttachedTransformation
is present.
Also see Init-Update-Final on the Crypto++ wiki. Its a recent addition, being added in January 2016.