c pointers char integer-overflow unsigned-char

char * vs unsigned char *

So I was playing around with char* and unsigned char* pointers. And I came across this issue:

Code:

#include <stdio.h>

void func(unsigned int max) {
  unsigned int* intptr = &max;
  
  printf("%u - %p\n", max, &max);
  printf("%u - %p\n\n", *intptr, intptr);
  
  printf("%u - %p\n", *((char*)intptr), ((char*)intptr));
  printf("%u - %p\n", *((signed char*)intptr), ((signed char*)intptr));
  printf("%u - %p\n", *((unsigned char*)intptr), ((unsigned char*)intptr));
}

int main(void)
{
  unsigned int max1 = 0b00000000000000000000000001111111;
  unsigned int max2 = 0b00000000000000000000000011111111;
  
  func(max1);
  printf("\n\n\n\n");
  func(max2);

  return 0;
}

Output:

127 - 0x7ffc9712c53c
127 - 0x7ffc9712c53c

127 - 0x7ffc9712c53c
127 - 0x7ffc9712c53c
127 - 0x7ffc9712c53c




255 - 0x7ffc9712c53c
255 - 0x7ffc9712c53c

4294967295 - 0x7ffc9712c53c
4294967295 - 0x7ffc9712c53c
255 - 0x7ffc9712c53c

So here Little Endian is being used, and clearly char is overflowing at 128. However why does it overflow and print out the UINT_MAX instead of -128?

Solution

There is no overflow here, at least as the C language defines it.

When you de-reference raw data using char* or signed char* pointers followed by * operator de-reference, that raw data is converted to the representation of the character type used. C has a special rule allowing any type to be inspected byte by byte using character types like this. However, it doesn't really specify the outcome.

The * operator says that if the result of indirection is invalid for the specified type, the behavior is undefined. But since a part of an int etc has no effective type in itself, we can't really say that the read byte has an effective type. And the only sensible thing the compiler can do then is to convert it to the type used for the access.

So you end up with the decimal representation of 0xFF expressed in 8 bit 2's complement signed form, that is -1.

Since printf is a variadic function, it implicitly promotes all arguments according to "the default argument promotion rules" turn small integer types like signed char into int. But as it does so, it respects the negative sign, so-called sign extension. So rather than type signed char -1 decimal (0xFF raw) you end up with type (int) -1 decimal (0xFFFFFFFF raw).

From there on you lie to printf with %u, telling it that the passed argument -1 of type int is actually unsigned int. Strictly speaking undefined behavior but in practice printf likely reinterprets the raw data 0xFFFFFFFF into the unsigned equivalent, 4294967295.