assembly gcc clang x86-64 addressing-mode

Why do GCC and Clang stop using RIP-relative loads for arrays bigger than 16MB?

My understanding is that RIP-relative addressing should work for offsets up to 2GB in size, but for some reason GCC (14.2) and Clang (19.1.0) stop using it when grabbing values more than 16MB away.

Given this code:

const int size = 1 << 22;

int small_array[size];

// 6 byte mov
int load_small_arry() {
    return small_array[(sizeof(small_array)/sizeof(int)-1)];
}

int big_array[size + 1];

// 5 byte mov + 6 byte mov in clang
// 9 byte mov in gcc
int load_big_arry() {
    return big_array[(sizeof(big_array)/sizeof(int)-1)];
}

I get this assembly from GCC (see Clang results in godbolt link, different but still switches away from RIP-relative):

load_small_arry():
 mov    eax,DWORD PTR [rip+0x0]        # 6 <load_small_arry()+0x6>
    R_X86_64_PC32 small_array+0xfffff8
 ret
 nop    WORD PTR [rax+rax*1+0x0]
load_big_arry():
 movabs eax,ds:0x0
    R_X86_64_64 big_array+0x1000000
 ret

This is a larger encoding so I'm not sure why it would be preferred.

Godbolt link

Solution

The relevant code in GCC is here. It seems it's not really specific to RIP-relative addressing. The more general rule is that GCC assumes a value of the form static_label + constant_offset is encodable as a signed 32-bit immediate only when constant_offset < 16MB. There's a comment:

For CM_SMALL assume that latest object is 16MB before end of 31bits boundary.

It looks like the idea is that they want to support the use of pointers like static_label + constant_offset even when the result exceeds the 2 GB limit. In the small code model, static_label is known to be within that limit, and they assume further that it's at least 16 MB from the end. But if constant_offset is larger than 16 MB, they no longer trust that the result will fit in a signed 32-bit immediate, and fall back to code that doesn't need it to.

I was originally thinking that this situation couldn't arise in well-defined ISO C or C++ code, because you're only allowed to do pointer arithmetic within a single array, and if the array is static, then all of it fits within 2 GB. So I thought maybe this code provided some sort of extension for compatibility, or for other language front-ends.

But actually, it can arise even in well-defined C, because it is fine to compile code which would access an array out-of-bounds, as long as you do not actually execute it. And the compiler may not be able to tell at compile time which is which.

Consider a program like:

file1.c

#ifdef BE_HUGE
char arr[2000000000];
const bool is_huge = true;
#else
char other_stuff[2000000000];
char arr[3];
const bool is_huge = false;
#endif

file2.c

extern char arr[];
extern const bool is_huge;

char foo(void) {
    return is_huge ? arr[1999999999] : -1;
}

There's nothing illegal about this code. But the compiler can't safely emit mov al, [rip+arr+1999999999] in foo(). It would be fine if we are in the BE_HUGE case, because then arr+1999999999 won't overflow 2G. But in !BE_HUGE, it might. In that case the instruction won't actually ever be executed, but it still has to link successfully.

In compiling file2.c, the compiler doesn't know which case we are in, so it needs to generate code that runs correctly in one case and still links in the other, and that prevents the use of the narrower addressing mode.