I was helping a student with his assignment and ran into a very gnarly "bug" which I could not explain, and was wondering if I could get help in understanding what exactly is going on.
The question was to implement the following:
given a byte array (
buf
) of length N, calculate the 16-bit checksum given by the formula:checksum = ~(buf[0]) + ~(buf[1]) + ... + ~(buf[N-2]) + ~(buf[N-1])
The implementation the student made was pretty straightforward:
uint16_t calculate_checksum(uint8_t *msg_buf, size_t msg_length)
{
uint16_t checksum = 0;
for (size_t i = 0; i < msg_length; i++)
{
checksum += ~(msg_buf[i]);
}
return checksum;
}
However, to my surprise, this implementation did not give the expected result.
I've tried looking at the checksum
variable by printing its value in the loop, which yielded an unexpected pattern:
65471
65459
65449
65287
65276
65253
65166
65113
65094
...
The checksum starts at 0
, but after the first addition the value sits somewhere in the upper range of the uint16_t
and going down. I suspected the value underflowing due to msg_buf[i]
being implicitly converted into a uint16_t
before complementing it. But I don't understand why. I would expect that first msg_buf
gets indexed with i
, then the byte complement is calculated (which would yield a value within 0-255), only then is it (implicitly) promoted into a uint16_t
.
I've tried to look at the assembly output of the statement checksum += ~(msg_buf[i])
, which seems to support this theory (using ARM gcc 14.1.0). Note that [r7, #4]
is the msg_buf
pointer, [r7, #8]
is i
and [r7, #14]
is checksum
. The assembly does something weird with subtraction, which yields the same result if you would convert the msg_buf[i]
byte to a uint16_t
before complementing.
ldr r2, [r7, #4]
ldr r3, [r7, #8]
add r3, r3, r2
ldrb r3, [r3] @ zero_extendqisi2
mov r2, r3
ldrh r3, [r7, #14] @ movhi
subs r3, r3, r2
uxth r3, r3
subs r3, r3, #1
strh r3, [r7, #14] @ movhi
Knowing this, the fix we came up with is pretty straightforward. We opted to just AND
the results of the right-hand side with 0xFF
to get rid of the higher bits, which yielded the correct checksum.
So, the issue is essentially fixed, but I still don't understand why this is an issue. Maybe this behaviour is expected and I don't know the proper operation orders, or maybe something else is happening. I really don't know.
Can someone please explain why this is happening?
What happened here is a result of integer promotions.
In most cases where a type smaller than int
is used in an expression, it is promoted to type int
. This is spelled out in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an
int
orunsigned int
may be used:
- An object or expression with an integer type (other than
int
orunsigned int
) whose integer conversion rank is less than or equal to the rank ofint
andunsigned int
.- A bit-field of type
_Bool
,int
,signed int
, orunsigned int
.If an
int
can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to anint
; otherwise, it is converted to anunsigned int
. These are called the integer promotions.58) All other types are unchanged by the integer promotions
And section 6.5.3.3p4 regarding unary arithmetic operators states the following regarding the ~
operator:
The result of the
~
operator is the bitwise complement of its (promoted) operand (that is, each bit in the result is set if and only if the corresponding bit in the converted operand is not set). The integer promotions are performed on the operand, and the result has the promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent to the maximum value representable in that type minus E.
So on this statement:
checksum += ~(msg_buf[i]);
The value of msg_buf[i]
is promoted to type int
before the ~
operator is applied. Assuming a 32-bit int
, this int
value will have all zeros for the upper 3 bytes. So when the operator is applied, those 3 bytes will have all bits set to 1. Then when that value is added to checksum
which has type uint16_t
, the lower 16 bits are kept where the upper 8 of those bits are all set to 1.
For example, if the value of msg_buf[i]
was 0x33, it would first get promoted to the int
value 0x00000033. Then after applying the ~
operator the result would be 0xffffff77. This value gets added to the current value of checksum
as an int
, then the result is truncated to a uint16_t
before being assigned.
After applying the ~
operator, the result needs to first get reduced back to an 8 bit value with either a cast:
checksum += (uint8_t)(~msg_buf[i]);
Or a bitmask:
checksum += ~msg_buf[i] & 0xff;