javafilechecksummessage-digest

Update checksum (MD5, SHA1) when append files java


Is it possible to update the checksum (MD5, SHA1) when we have Hash value when we append file.

  1. I have file A already uploaded to server and i already have MD5 file which contain MD5 hash value.
  2. I want to append a new Data block (byte[]) to the file A and i have to update the new hash value for Md5 file.

Is it possible to update the MD5 hash value for the new file without reading the whole file A to create file hash (because in case file A is too large and it take too much time).


Solution

  • If, and only if, you can choose the new data block to consist of one 0x80 byte, a certain number of 0x00 bytes depending on the size of file A, and 4 bytes containing the bit length of file A, followed by any other data you like, YES.

    This is called a Length Extension Attack and is a cryptographic weakness of all hashes using the Merkle-Damgard construction, which includes MD5 SHA1 and the SHA-2 family, but not the SHA-3 family. This is not really a programming question and is more suitable on crypto.SX where there are already quite a few questions about it, such as https://crypto.stackexchange.com/questions/17733/sha1-multipart-calculation and https://crypto.stackexchange.com/questions/3978/understanding-the-length-extension-attack

    However, if you save the hash's normally internal state as of the last full block before the end of data, and restore it and resume 'updating' from there adding the (unrestricted) new data, as I believe the other answers more or less intended, you can compute the new hash (and the new saved state if you want to repeat this process). If and how to access this state, and exactly how it needs to be represented, depends on the implementation you use. You tagged Java although your actual Q doesn't mention it; doing this using the crypto Java provides (JCA) would be very difficult because JCA intentionally hides the details of all supported algorithms behind a series of simplified, abstracted facade classes. OTOH if you (re)code these hashes yourself, accessing the internal state could be quite easy. And if you use the BouncyCastle 'lightweight' implementation(s), probably not very hard, though maybe at risk of them changing the implementation, but I'd have to look in detail. Storing and retrieving it may or may not be an issue.