I wanted to look at the byte-representation of different objects in memory and found this function to do it:
template <typename T>
void print_bytes(const T& input, std::ostream& os = std::cout)
{
const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);
os << std::hex << std::showbase;
os << "[";
for (unsigned int i=0; i<sizeof(T); ++i)
os << static_cast<int>(*(p++)) << " ";
os << "]" << std::endl;;
}
At this stack overflow post
input
is casted in: const unsigned char* p = reinterpret_cast<const unsigned char*>(&input);
?Initially I thought casting like this would somehow alter the memory address since different types are different sizes and therefore need a different amount of bytes to be represented. So I wrote the following:
int x = 4328;
auto* p0 = &x;
auto* p1 = reinterpret_cast<const unsigned char*>(p0);
printf("%p\n", p0);
printf("%p\n", p1);
Output:
0x7ffdd1bff604
0x7ffdd1bff604
So they are not altered. So whats really happening here?
Ignoring byte order and stuff ("endianess"), let's assume you have a 4-byte integer with value 0x12345678 stored in memory like this:
address 2200 -> 0x12
address 2201 -> 0x34
address 2202 -> 0x56
address 2203 -> 0x78
int32_t test=0x12345678; // assume it's stored like above
int32_t* p=&test; // "p" now has the value "2200"
assert(*p==0x12345678); // read value through pointer
Now, if you cast the pointer to char*
, it will still have the value:
unsigned char* q=reinterpret_cast<unsigned char*>(p); // "q" also has "2200"
assert((void*)p==(void*)q);
It's the same pointer, you're just telling the compiler it points to something different. This is why reinterpret_cast
is rarely used -- it's a very dangerous thing without much of a real use case.
However, since "char" is just one byte, dereferencing q
gives a different value:
assert(*q==0x12); // just one byte at the pointer location
assert(q[0]==0x12);
assert(q[1]==0x34); // we know it's 4 bytes, so we can safely read the others
assert(q[2]==0x56);
assert(q[3]==0x78);
As a side note, while the above is merely meant to be a basic overview, most computer systems that you'll encounter will store that integer differently:
address 2200 -> 0x78
address 2201 -> 0x56
address 2202 -> 0x34
address 2203 -> 0x12
It just looks backwards, so I've chosen the "big endian" format for my explanation, while stuff tends to run "little endian" these days.