mathhashcryptography

Do cryptographic hash functions reach each possible value, i.e., are they surjective?


Take a commonly used binary hash function - for example, SHA-256. As the name implies, it outputs a 256 bit value.

Let A be the set of all possible 256 bit binary values. A is extremely large, but finite.

Let B be the set of all possible binary values. B is infinite.

Let C be the set of values obtained by running SHA-256 on every member of B. Obviously this can't be done in practice, but I'm guessing we can still do mathematical analysis of it.

My Question: By necessity, CA. But does C = A?

EDIT: As was pointed out by some answers, this is wholly dependent on the has function in question. So, if you know the answer for any particular hash function, please say so!


Solution

  • First, let's point out that SHA-256 does not accept all possible binary strings as input. As defined by FIPS 180-3, SHA-256 accepts as input any sequence of bits of length lower than 2^64 bits (i.e. no more than 18446744073709551615 bits). This is very common; all hash functions are somehow limited in formal input length. One reason is that the notion of security is defined with regards to computational cost; there is a threshold about computational power that any attacker may muster. Inputs beyond a given length would require more than that maximum computational power to simply evaluate the function. In brief, cryptographers are very wary of infinites, because infinites tend to prevent security from being even defined, let alone quantified. So your input set C should be restricted to sequences up to 2^64-1 bits.

    That being said, let's see what is known about hash function surjectivity.

    Hash functions try to emulate a random oracle, a conceptual object which selects outputs at random under the only constraint that it "remembers" previous inputs and outputs, and, if given an already seen input, it returns the same output than previously. By definition, a random oracle can be proven surjective only by trying inputs and exhausting the output space. If the output has size n bits, then it is expected that about 2^(2n) distinct inputs will be needed to exhaust the output space of size 2^n. For n = 256, this means that hashing about 2^512 messages (e.g. all messages of 512 bits) ought to be enough (on average). SHA-256 accepts inputs very much longer than 512 bits (indeed, it accepts inputs up to 18446744073709551615 bits), so it seems highly plausible that SHA-256 is surjective.

    However, it has not been proven that SHA-256 is surjective, and that is expected. As shown above, a surjectivity proof for a random oracle requires an awful lot of computing power, substantially more than mere attacks such as preimages (2^n) and collisions (2^(n/2)). Consequently, a good hash function "should not" allow a property such as surjectivity to be actually proven. It would be very suspicious: security of hash function stems from the intractability of their internal structure, and such an intractability should firmly oppose to any attempt at mathematical analysis.

    As a consequence, surjectivity is not formally proven for any decent hash function, and not even for "broken" hash functions such as MD4. It is only "highly suspected" (a random oracle with inputs much longer than the output should be surjective).