checksumcrccrc16

16 bit Checksum fuzzy analsysis - Leveraging "collisions", biases a thing?


If playing around with CRC RevEng fails, what next? That is the gist of my question. I am trying to learn more how to think for myself, not just looking for an answer 1 time to 1 problem.

Assuming the following:

1.) You have full control of white box algorithm and can create as many chosen sample messages as you want with valid 16 bit / 2 byte checksums

2.) You can verify as many messages as you want to see if they are valid or not

3.) Static or dynamic analysis of the white box code is off limits (say the MCU is of a lithography that would require electron microscope to analyze for example, not impossible but off limits for our purposes).

Can you use any of these methods or lines of thinking:

1.) Inspect "collisions", i.e. different messages with same checksum. Perhaps XOR these messages together and reveal something?

2.) Leverage strong biases towards certain checksums?

3.) Leverage "Rolling over" of the checksum "keyspace", i.e. every 65535 sequentially incremented messages you will see some type of sequential patterns?

4.) AI ?

Perhaps there are other strategies I am missing?

CRC RevEng tool was not able to find the answer using numerous settings configurations

EDIT: Here are some example messages

Sample 1: FFFFFFFFFFFF /

0100100A00000000000000000000FFFFFFFFFF72BE

Sample 2: 000000000000 /

0100100A00000000000000000000FFFFFFFFFF2A3D

Sample 3: 000000000001 /

0100100A00000000000000000000FFFFFFFFFF89F7

Sample 4: 000000000002 /

0100100A00000000000000000000FFFFFFFFFF0864

Sample 5: 000000000003 /

0100100A00000000000000000000FFFFFFFFFFABAE

Sample 6: 000000000004 /

0100100A00000000000000000000FFFFFFFFFF9396

Sample 7: 000000000005 /

0100100A00000000000000000000FFFFFFFFFF305C

Sample 8: 000000000006 /

0100100A00000000000000000000FFFFFFFFFFB1CF


Solution

  • Key properties and attacks:

    1. If you have two messages + CRCs of the same length and exclusive-or them together, the result is one message and one pure CRC on that message, where "pure" means a CRC definition with a zero initial value and zero final exclusive-or. This helps take those two parameters out of the equation, which can be solved for later. You do not need to know where the CRC is, how long it is, or which bits of the message are participating in the CRC calculation. This linear property holds.

    2. Working with purified examples from #1, if you take any two equal-length messages + pure CRCs and exclusive-or them together, you will get another valid message + pure CRC. Using this fact, you can use Gauss-Jordan elimination (over GF(2)) to see how each bit in the message affects other generated bits in the message. With this you can find out a) which bits in the message are particiapting, b) which bits are likely to be the CRC (though it is possible that other bits could be a different function of the input, which can be resolved by the next point), and c) verify that the check bits are each indeed a linear function of the input bits over GF(2). This can also tell you that you don't have a CRC to begin with, if there don't appear to be any bits that are linear functions of the input bits. If it is a CRC, this can give you a good indication of the length, assuming you have a contiguous set of linearly dependent bits.

    3. Assuming that you are dealing with a CRC, you can now take the input bits and the output bits for several examples and try to solve for the polynomial given different assumptions for the ordering of the input bits (reflected or not, perhaps by bytes or other units) and the direction of the CRC shifting (reflected or not). Since you're talking about an allegedly 16-bit CRC, this can be done most easily by brute force, trying all 32,768 possible polynomials for each set of bit ordering assumptions. You can even use brute force on a 32-bit CRC. For larger CRCs, e.g. 64 bits, you would need to use Berlekamp's algorithm for factoring polynomials over finite fields in order to solve the problem before the heat death of the universe. Then you would factor each message + pure CRC as a polynomial, and look for common factors over multiple examples.

    4. Now that you have the message bits, CRC bits, bit orderings, and the polynomial, you can go back to your original non-pure messages + CRCs, and solve for the initial value and final exclusive-or. All you need are two examples with different lengths. Then it's a simple two-equations-in-two-unknowns over GF(2).

    Enjoy!