Messing around a bit with C pointers, I came across a rather strange behavior.
Consider the following code :
int
main ()
{
char charac = 'r';
long long ptr = (long long) &charac; // Stores the address of charac into a long long variable
printf ("[ptr] points to %p containing the char %c\n", ptr, *(char*)ptr);
}
Now when compiled for a 64-bits target architecture (compilation command : gcc -Wall -Wextra -std=c11 -pedantic test.c -o test
), everything is fine, the execution gives
> ./test
[ptr] points to 0x7fff3090ee47 containing the char r
But, if the compilation targets a 32-bits arch (with compilation command : gcc -Wall -Wextra -std=c11 -pedantic -ggdb -m32 test.c -o test
), the execution gives this weird result :
> ./test
[ptr] points to 0xff82d4f7 containing the char �
The weirdest part now is if I change the printf
call in the previous code to printf ("[ptr] contains the char %c\n", *(char*)ptr);
, the execution gives a correct result :
> ./test
[ptr] contains the char r
The issue seems to arise only on 32-bits arch, and I can't figure out why the printf
call change causes the execution to behave differently.
PS: It's maybe worth mentioning that the underlying machine is a x86 64-bits architecture, but using the 32-bits compatibility mode triggered by the -m32
option in gcc
.
You are basically cheating your compiler.
You tell printf
that you pass a pointer as first parameter after the format string. But instead you pass an integer variable.
While this is always undefined behaviour, it may somehow work as long as the size of expected type and passed type are the same. That's the "undefined" in "undefined behaviour". It is also not defined to crash or immediately show bad results. It may just pretent to work while waiting to hit you from behind.
If your long long
has 64 bits while a pointer only has 32 bits, the layout of your stack is broken causing printf
to read from wrong location.
Depending on your architecture and tools, you have good chances that your stack looks like this when you call a function with variadic parameter list:
+---------------+---------------+---------------+
| last fixed par| Par 1 type1 | Par 2 type2 |
| x bytes | x bytes | x bytes |
+---------------+---------------+---------------+
The unknown parameters are pushed on the stack and finally the last known parameter from the signature is pushed. (Other known parameters are ignored here)
Then the function can walk through the parameter list using va_arg
and friends. For this purpose the function must know which types of parameters are passed. The printf
function uses the format specifier to decide which parameter to consume from the stack.
Now it comes to the point where everything depends on you telling the truth.
What you tell your compiler:
+---------------+---------------+---------------+
| format char* | Par 1 void* | Par 2 int |
| 4 bytes | 4 bytes | 4 bytes |
+---------------+---------------+---------------+
For the first parameter (%p
) the compiler takes 4 bytes which is the size of a void*
. Then it takes another 4 bytes (size of an int
) for parameter 2 (%c
).
(Note: The last parameter is printed as a character, i.e. only 1 byte will be used in the end. Due to integer type promotion rules for function calls without proper parameter type specification the parameter is stored as an int
on the stack. Hence printf
must also consume the bytes for an int
in this case.)
Now let's look at your function call (What you really put into printf
):
+---------------+-------------------------------+---------------+
| format char* | Par 1 long long | Par 2 int |
| 4 bytes | 8 bytes | 4 bytes |
+---------------+-------------------------------+---------------+
You still claim to provide a pointer and a integer parameter of 4 bytes each.
But now the first parameter comes with an extra 4 bytes of length which remains unknown to the printf
function.
As you have told it, the function reads 4 bytes for the pointer. This may be in line with the first 4 bytes of the long long
but the remaining 4 bytes are not consumed.
Now the next 4 bytes that are used for the %c
format, are read but we are still reading the second half of your long long
Whatever this may be, it is not what you want to.
Finally the pushed integer is still untouched when the function returns.
That's the reason why you should not mess with weird type casting and wrong types.
And that's also the reason why you should look at your warnings during compiling.