rhashuint16

Can't replicate hashing of string


I need to reproduce a JS function that hashes a string with SHA-256 in r.
The said function is:

function hashPhrase (phrase) {
  const buf = new ArrayBuffer(phrase.length * 2)
  const bufView = new Uint16Array(buf)
  const strLen = phrase.length
  for (let i = 0; i < strLen; i++) {
    bufView[i] = phrase.charCodeAt(i)
  }
  return window.crypto.subtle.digest('SHA-256', buf)
    .then(hashArrayBuffer => {
      let binary = ''
      const bytes = new Uint8Array(hashArrayBuffer)
      const len = bytes.byteLength
      for (let i = 0; i < len; i++) {
        binary += String.fromCharCode(bytes[i])
      }
      return Promise.resolve(window.btoa(binary))
    })
}

Calling the function:

hashPhrase('test').then(x => { console.log(x) })

gives me:

/lIGdrGh2T2rqyMZ7qA2dPNjLq7rFj0eiCRPXrHeEOs=

as output.

I load openssl and try to use sha256 as function to hash the string.

library(openssl)

phrase = "test"
phraseraw = charToRaw(phrase)
base64_encode(sha256(phraseraw))

and the output is:

[1] "n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg="

Don't know if the problem is the uint16 because in both cases I guess that the variable is being passed as raw.

I'll appreciate very much any help.


Solution

  • Because you created an Uint16Array, you are using two bytes per character and those values are presumably being stored in little-endian byte order. By default in R, since you just have ASCII characters, it is just using one byte per character. So with javascript you are digesting the bytes

    [116, 0, 101, 0, 115, 0, 116, 0]
    

    But with the R code you are digesting the bytes.

    [116, 101, 115, 116]
    

    If you really want to include those padded values in R like you do in javascript, you can convert to UTF16-LE before the digest

    phrase = "test"
    phraseraw = iconv(phrase,from="ASCII", to="UTF-16LE",toRaw = TRUE)[[1]]
    base64_encode(sha256(phraseraw))
    # [1] "/lIGdrGh2T2rqyMZ7qA2dPNjLq7rFj0eiCRPXrHeEOs="
    

    So really just make sure you are encoding your string into bytes in the exact same way in both languages.