cx86printf32-bitlong-long

Using long long integer to store 32 bit pointer causes printf to bug


Messing around a bit with C pointers, I came across a rather strange behavior.
Consider the following code :

int 
main ()
{
   char charac = 'r';

   long long ptr = (long long) &charac;  // Stores the address of charac into a long long variable

   printf ("[ptr] points to %p containing the char %c\n", ptr, *(char*)ptr);

}

On 64-bits architectures

Now when compiled for a 64-bits target architecture (compilation command : gcc -Wall -Wextra -std=c11 -pedantic test.c -o test), everything is fine, the execution gives

> ./test 
[ptr] points to 0x7fff3090ee47 containing the char r

On 32-bits architectures

But, if the compilation targets a 32-bits arch (with compilation command : gcc -Wall -Wextra -std=c11 -pedantic -ggdb -m32 test.c -o test), the execution gives this weird result :

> ./test     
[ptr] points to 0xff82d4f7 containing the char �

The weirdest part now is if I change the printf call in the previous code to printf ("[ptr] contains the char %c\n", *(char*)ptr);, the execution gives a correct result :

> ./test     
[ptr] contains the char r

The issue seems to arise only on 32-bits arch, and I can't figure out why the printf call change causes the execution to behave differently.

PS: It's maybe worth mentioning that the underlying machine is a x86 64-bits architecture, but using the 32-bits compatibility mode triggered by the -m32 option in gcc.


Solution

  • You are basically cheating your compiler.

    You tell printf that you pass a pointer as first parameter after the format string. But instead you pass an integer variable.

    While this is always undefined behaviour, it may somehow work as long as the size of expected type and passed type are the same. That's the "undefined" in "undefined behaviour". It is also not defined to crash or immediately show bad results. It may just pretent to work while waiting to hit you from behind.

    If your long long has 64 bits while a pointer only has 32 bits, the layout of your stack is broken causing printf to read from wrong location.

    Depending on your architecture and tools, you have good chances that your stack looks like this when you call a function with variadic parameter list:

    +---------------+---------------+---------------+
    | last fixed par| Par 1   type1 | Par 2   type2 |
    |    x bytes    |    x bytes    |    x bytes    | 
    +---------------+---------------+---------------+
    

    The unknown parameters are pushed on the stack and finally the last known parameter from the signature is pushed. (Other known parameters are ignored here)

    Then the function can walk through the parameter list using va_arg and friends. For this purpose the function must know which types of parameters are passed. The printf function uses the format specifier to decide which parameter to consume from the stack.

    Now it comes to the point where everything depends on you telling the truth.

    What you tell your compiler:

    +---------------+---------------+---------------+
    | format  char* | Par 1   void* | Par 2     int |
    |    4 bytes    |    4 bytes    |    4 bytes    | 
    +---------------+---------------+---------------+
    

    For the first parameter (%p) the compiler takes 4 bytes which is the size of a void*. Then it takes another 4 bytes (size of an int) for parameter 2 (%c).

    (Note: The last parameter is printed as a character, i.e. only 1 byte will be used in the end. Due to integer type promotion rules for function calls without proper parameter type specification the parameter is stored as an int on the stack. Hence printf must also consume the bytes for an int in this case.)

    Now let's look at your function call (What you really put into printf):

    +---------------+-------------------------------+---------------+
    | format  char* |   Par 1           long long   | Par 2     int |
    |    4 bytes    |            8 bytes            |    4 bytes    | 
    +---------------+-------------------------------+---------------+
    

    You still claim to provide a pointer and a integer parameter of 4 bytes each. But now the first parameter comes with an extra 4 bytes of length which remains unknown to the printf function. As you have told it, the function reads 4 bytes for the pointer. This may be in line with the first 4 bytes of the long long but the remaining 4 bytes are not consumed. Now the next 4 bytes that are used for the %c format, are read but we are still reading the second half of your long long Whatever this may be, it is not what you want to. Finally the pushed integer is still untouched when the function returns.

    That's the reason why you should not mess with weird type casting and wrong types.

    And that's also the reason why you should look at your warnings during compiling.