excelxmlencryptionhashsalt-cryptography

Why is my salted SHA-512 hashing code not matching Excel's?


I am attempting to replicate the hashing the Excel does when a sheet is password-protected in Python, but am not matching even when testing on dummy inputs. From the xml file, I am seeing this:

sheetProtection algorithmName="SHA-512"
hashValue="Ua4h+FTPQI0+aCSbQ1Ya9fDYsddMzCfAypD1u1TBGmNONIy6sRfJBLDoMhOfbCv0i5Q2t1JOm4okjSvC1CsJYw==" 
saltValue="Furur6jnDIFaQBhHQBXzFA==" 
spinCount="100000"

To replicate this, I coded the following in Python:

import hashlib
import base64

hash_value = "Ua4h+FTPQI0+aCSbQ1Ya9fDYsddMzCfAypD1u1TBGmNONIy6sRfJBLDoMhOfbCv0i5Q2t1JOm4okjSvC1CsJYw=="
salt_value = "Furur6jnDIFaQBhHQBXzFA=="

password = "password"

pdata = password.encode('utf-8')
sdata = base64.b64decode(salt_value)
hash_iter = (sdata + pdata)

for i in range(100000):
    hash_iter = hashlib.sha512(hash_iter).digest()

print(base64.b64encode(hash_iter).decode())

which returns the following result:

9o4313eeh/ym8+GHSHW4iyh1usvNVD1DflzET5WgG9QKutn0loM24Op7/McAGr4D5H10W+DuQCD8Tj8Cn7uDOg==

What am I doing wrong here? I have tried switching between prefix/suffix for the salt, and it does not get me Excel's final hash. I have also tried different encodings for the plain text password into binary, but that doesn't seem to be the issue. I have a suspicion it might have something to do with how the hash is iterated, but I am not sure what I'm doing wrong.


Solution

  • From the link provided in Panagiotis Kanavos' comment:

    Let H() be an implementation of the hashing algorithm specified by AlgorithmName, iterator be an unsigned 32-bit integer, Hn be the hash data of the nth iteration, and a plus sign (+) represent concatenation. The initial password hash is generated as follows.

    H0 = H(salt + password)

    The hash is then iterated using the following approach.

    Hn = H(Hn-1 + iterator)

    where iterator is initially set to 0 and is incremented monotonically on each iteration until SpinCount iterations have been performed. The value of iterator on the last iteration MUST be one less than SpinCount. The final hash is then Hfinal = HSpinCount-1.

    and a little trial and error, it appears that the Excel password hash algorithm can be reproduced in Python with the following:

    def excel_password_hash_sha512(password: str, salt: bytes, iteration_count: int) -> str:
        password_bytes = password.encode('utf-16le')
        h0 = hashlib.sha512(salt + password_bytes).digest()
        h_i = h0
        for iterator in range(iteration_count):
            h_i = hashlib.sha512(h_i + iterator.to_bytes(4, 'little', signed=False)).digest()
        return base64.b64encode(h_i).decode()
    

    The non-obvious parts that had to be deduced were: