Post-inversion of CRC32 result and trailing zeroes

For some very specific values, such as

FF FF FF FF 80 20 83 B8 ED

the CRC32 (using polynomial 0x04C11DB7 and pre and post-inversion) is 0xFFFFFFFF (crccalc.com).

Adding any number of trailing zeroes does not change the result (since that just multiplies the message polynomial).

My doubt is that, according to Wikipedia, post inversion was supposed to prevent just that:

A similar solution can be applied at the end of the message, inverting the CRC register before it is appended to the message. Again, any non-zero change will do; inverting all the bits (XORing with an all-ones pattern) is simply the most common.

That doesn't seem to be the case. Also, this answer by Mark Adler suggests that the post-inversion is just so the CRC of an empty message is 0x00000000.

Is the Wikipedia article incorrect or did I misunderstand something?

Solution

For any n-bit CRC and any current state of the CRC, there will exist a sequence of n bits in the message that will bring the internal CRC register to all zero bits. And many sequences of more than n bits that will do the same. From there on, any application of zero bits will leave the register all zeros.

That n-bit sequence is easily found. It is the internal CRC register bits themselves at that point. For example, the standard CRC-32 that you reference, when applied to the nine-byte message "123456789" is 0xcbf43926. Since the final exclusive-or is 0xffffffff, the internal CRC register at the end is the complement of that, 0x340bc6d9. This is a reflected CRC, so you need to feed that value starting from its least significant bit. Then you find that the CRC-32 of "123456789\xd9\xc6\x0b\x34" is 0xffffffff. Now I can follow that message with any number of zeros, and still get 0xffffffff. E.g. "123456789\xd9\xc6\x0b\x34\x00\x00\x00".

However that is the only such sequence of four bytes that will do that. In general, the probability of bringing the internal CRC register to all zeros given any appended sequence of n or more random bits will be 2^-n. So unless you're being deliberate about it, this insensitivity to a subsequent sequence of zero bits will happen very infrequently.

Initializing the internal CRC register to a non-zero value, as many CRC definitions do, avoids this behavior at the very start of the process. It may not be unusual for the start of a message to be a sequence of zeros, so you would like for the CRC to be sensitive to the length of that sequence.

The final exclusive-or doesn't change the behavior. All it does is change the final CRC value that you would be stuck at once you arrive at the state of the internal CRC register being zero.

As you noted, the final exclusive-or is often set equal to the initial CRC register value, so that you get what some might consider to be a "nice" behavior that the CRC of an empty message is zero.