The following code
#include <stdio.h>
int main()
{
long long data = 0xFFFEABCD11112345;
char *pData = (char *)&data;
printf("Value at address %p is %x\n", pData, *pData);
pData = pData + 5;
printf("Value at address %p is %x\n", pData, *pData);
return 0;
}
produces an output similar to
Value at address 00000023515FFC00 is 45
Value at address 00000023515FFC05 is ffffffab
Given that pData
is a char *
, I was expecting the second value to be ab
instead of ffffffab
. I believe that the %x
format specifier might be the culprit but I do not fully understand it. Where do the leading f
's come from?
char
may be either signed or unsigned depending on compiler. Is char signed or unsigned by default? In this case it appears to be signed.
On mainstream computers, signed char
can only hold values -128 to 127. 0xAB
on such a computer would be the 2's complement representation of a decimal negative number -85
.
C has various forms of implicit type promotions that happen when small types like char
are used in most expressions, or as in this case when passed along to a variadic function printf
. The special set of implicit promotion rules for variadic functions are called "the default argument promotions" and they state that small integer types get promoted to int
regardless of signedness.
In case we have a signed char
with value -85, then during promotion to int
the sign is respected, which is known as sign extension. Meaning the value is still -85 but the binary 2's complement representation of the promoted int
may be 0xFFFFFFAB
(assuming 32 bit int).
However, if we had unsigned char
with the value 0xAB/171, then during promotion to int
the value is just kept and no sign is present. So we could have avoided sign extension by casting: (unsigned char)*pData
. The explicit conversion from signed to unsigned is well-defined.
The format string of printf
holds no relevance for this promotion. %x
expects a parameter which is unsigned int
so we essetially lie to print since we pass a char
promoted to int
, strictly speaking undefined behavior. However, printf
in this case just reads the binary representation of the int
and presents it as 0xFFFFFFAB
.
Take away:
char
(or signed types in general) when dealing with raw binary data or hardware-related programming is a bad idea. Use unsigned char
or uint8_t
.