cpointersundefined-behaviorpointer-conversion

Is it possible to have a 32-bit pointer on x86-64 without undefined behavior?


Generally pointers on x86-64 are defined to be 8 bytes. However, if you are certain that you have data that will only ever be in the first 4GB of the address space, then a 32-bit value is technically sufficient. Sometimes people will create a <4GB buffer, and store 32-bit offsets into that buffer, but I am asking about directly storing a memory address in 32-bits, NOT storing an offset into some other buffer.

You could do something like:

struct small_pointer {
    uint32_t p;
};

char load_small_pointer(struct small_pointer p, size_t index)
{
    return *((char*)p + index);
}

However this involves converting from a uint32_t to a (char*) which is probably undefined behavior. The standard allows round trips through uintptr_t, but I don't know of any special allowance for other types, even though when our conditons are met the conversion from uintptr_t to uint32_t should be lossless. Is there any standards compliant way to do this? If not, do GCC/Clang provide any implementation specific guarantees?

In case you're wondering why you would want this: the 'pointers' take up half as much data cache, and it's not unusual for operating systems to store special data in either the high or low end of the address range. Needing a different pointer type for some objects is already natural in these situations. Expressing it as an offset into some buffer allocated at runtime would involve an extra load (loading the address from the pointer to the buffer, then the actual load from the offsetted location).


Solution

  • Is there any standards compliant way to do this?

    No, the C standard makes conversions from pointers to integers implementation-defined, per C 2024 6.3.3.3, so there is no way to guarantee based on the standard alone that pointers into the low 4 GiB of address space can be stored using 32-bit integers.1, 2

    If not, do GCC/Clang provide any implementation specific guarantees?

    For GCC, there is barely a guarantee, when it is completed with additional platform documentation. For Clang, documentation seems lacking.

    The GCC 14.2 Manual documents in clause 4.7 that “A cast from pointer to integer discards most-significant bits if the pointer representation is larger than the integer type, sign-extends if the pointer representation is smaller than the integer type, otherwise the bits are unchanged” and “A cast from integer to pointer discards most-significant bits if the pointer representation is smaller than the integer type, extends according to the signedness of the integer type if the pointer representation is larger than the integer type, otherwise the bits are unchanged.”

    Thus, if a pointer representation contains only zero bits above the low 32 bits, you can convert it directly to a uint32_t in GCC without losing any information, and you can convert it back to the pointer type to restore the original pointer value.

    We still need to know how pointers are represented. They are not necessarily plain hardware addresses. Pointer representation is nominally covered in the GCC manual by 4.15, which says the bytes encoding an object, other than as specified by the C standard, are “Determined by ABI.”

    Thus, if you are using a platform that uses plain hardware addresses (per its ABI) and are using GCC 14.2 (or any other version documented as described above), then pointers to locations in the low 4 GiB of the address space can be converted to unsigned 32-bit integers, stored as such, and converted back to the original pointer type to restore the original value.

    Note that you should use an unsigned type. The conversion of an address in the second 2 GiB would set the sign bit of a signed 32-bit integer, and then the conversion back to the pointer type would sign-extend that, producing an address different from the original.

    I do not see that the Clang documentation defines conversions between pointers and integers, although it could be buried in the documentation somewhere.

    Footnotes

    1 Actually, the question is improperly phrased. Any source code that is accepted by at least one C implementation is conforming to the C standard, even if its behavior varies between C implementations or is not defined at all by the C standard. The desired question is whether there is strictly conforming code that is guaranteed to store the desired pointers using 32-bit integers, that is, code that works in all C implementations, not just one.

    2 Some conversions to pointers are undefined rather than implementation-defined, when the result of the conversion would not fit in the destination type. However, this would be easily avoided by converting to uintptr_t and then to uint32_t, provided these optional types are available.