algorithmcryptographycryptanalysis

SHA256 Find Partial Collision


I have two message:

messageA: "Frank is one of the "best" students topicId{} "

messageB: "Frank is one of the "top" students topicId{} "

I need to find SHA256 partially collision of these two messages(8 digits). Therefore, The first 8 digests of SHA256(messageA) == The first 8 digest of SHA256(messageB)

We can put any letters and numbers in {}, Both {} should have same string

I have tried brute force and birthday attack with hash table to solve this problem, but it costs too much time. I know the cycle detection algorithm like Floyd and Brent, however i have no idea how to construct the cycle for this problem. Are there any other methods to solve this problem? Thank you so much!


Solution

  • This is pretty trivial to solve with a birthday attack. Here's how I did it in Python (v2):

    def find_collision(ntries):
        from hashlib import sha256
        str1 = 'Frank is one of the "best" students topicId{%d} '
        str2 = 'Frank is one of the "top" students topicId{%d} '
        seen = {}
        for n in xrange(ntries):
            h = sha256(str1 % n).digest()[:4].encode('hex')
            seen[h] = n
        for n in xrange(ntries):
            h = sha256(str2 % n).digest()[:4].encode('hex')
            if h in seen:
                print str1 % seen[h]
                print str2 % n
    
    find_collision(100000)
    

    If your attempt took too long to find a solution, then either you simply made a mistake in your coding somewhere, or you were using the wrong data type.

    Python's dictionary data type is implemented using hash tables. That means you can search for dictionary elements in constant time. If you implemented seen using a list instead of a dict in the above code, then the search at line 11 would take an awful lot longer.


    Edit:

    If the two topicId tokens have to be identical, then — as pointed out in the comments — there is little option but to grind through somewhere in the order of 231 values. You will find a collision eventually, but it could take a long time.

    Just leave this running overnight and with a bit of luck you'll have an answer in the morning:

    def find_collision():
        from hashlib import sha256
        str1 = 'Frank is one of the "best" students topicId{%x} '
        str2 = 'Frank is one of the "top" students topicId{%x} '
        seen = {}
        n = 0
        while True:
            if sha256(str1 % n).digest()[:4] == sha256(str2 % n).digest()[:4]:
                print str1 % n
                print str2 % n
                break
            n += 1
    
    find_collision()
    

    If you're in a hurry, you could maybe look into using a GPU to speed up the hash calculations.