c++castingmemory-address

What happens when a memory adress is casted like this?


I wanted to look at the byte-representation of different objects in memory and found this function to do it:

template <typename T>
void print_bytes(const T& input, std::ostream& os = std::cout)
{
  const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);
  os << std::hex << std::showbase;
  os << "[";
  for (unsigned int i=0; i<sizeof(T); ++i)
    os << static_cast<int>(*(p++)) << " ";
  os << "]" << std::endl;;
}

At this stack overflow post

  1. What I am curious about is why the address of input is casted in: const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);?

Initially I thought casting like this would somehow alter the memory address since different types are different sizes and therefore need a different amount of bytes to be represented. So I wrote the following:

int x = 4328;
auto* p0 = &x;
auto* p1 = reinterpret_cast<const unsigned char*>(p0);
printf("%p\n", p0);
printf("%p\n", p1);

Output:

0x7ffdd1bff604
0x7ffdd1bff604

So they are not altered. So whats really happening here?


Solution

  • Ignoring byte order and stuff ("endianess"), let's assume you have a 4-byte integer with value 0x12345678 stored in memory like this:

    address 2200 -> 0x12
    address 2201 -> 0x34
    address 2202 -> 0x56
    address 2203 -> 0x78
    
    int32_t test=0x12345678;     // assume it's stored like above
    int32_t* p=&test;            // "p" now has the value "2200"
    assert(*p==0x12345678);      // read value through pointer
    

    Now, if you cast the pointer to char*, it will still have the value:

    unsigned char* q=reinterpret_cast<unsigned char*>(p);       // "q" also has "2200"
    assert((void*)p==(void*)q);
    

    It's the same pointer, you're just telling the compiler it points to something different. This is why reinterpret_cast is rarely used -- it's a very dangerous thing without much of a real use case.

    However, since "char" is just one byte, dereferencing q gives a different value:

    assert(*q==0x12);       // just one byte at the pointer location
    assert(q[0]==0x12);
    assert(q[1]==0x34);     // we know it's 4 bytes, so we can safely read the others
    assert(q[2]==0x56);
    assert(q[3]==0x78);
    

    As a side note, while the above is merely meant to be a basic overview, most computer systems that you'll encounter will store that integer differently:

    address 2200 -> 0x78
    address 2201 -> 0x56
    address 2202 -> 0x34
    address 2203 -> 0x12
    

    It just looks backwards, so I've chosen the "big endian" format for my explanation, while stuff tends to run "little endian" these days.