c language-lawyer undefined-behavior pointer-arithmetic

Is array pointer arithmetic undefined behavior in C for separate translation units?

There are several source code files of a program. File file_a.c has an array in the global scope, and a function is provided that returns a pointer to its beginning:

static int buffer[10];
int *get_buf_addr(void) {
    return buffer;
}

In this file, for example, the array "buffer" is filled with data, and the function get_buf_addr() is called from another translation unit file_b.c to separate the levels of abstraction of the program. Somewhere in file_b.c, get_buf_addr() is called to read data from the received buffer address and send it where it needs to go. Do I understand correctly that after the call:

int *buf = get_buf_addr();

I am formally not allowed to move "forward" or "backward" by the pointer, as if the compiler no know that these addresses belong to the same array? I turned to the standard, paragraph 6.5.6 Additive operators:

... If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined." And in the same paragraph:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type."

That is, formally, at the point of calling int *buf = get_buf_addr(); the compiler does not know whether buf points to a single object (not an array) of type int, or to an array of such objects (and if to an array, how long is this array?). I assume that a strictly conforming compiler should treat such a pointer as a pointer to a single int object. So, point 9 quoted above applies, and further arithmetic on such pointers with subsequent access (e.g. UART->FIFO = buf[5];) is undefined behavior.

Is this true? If so, what is the formally correct way to access aggregates (arrays, structures) from separate translation modules so that the program does not contain undefined behavior?
If these are char * pointers, does that change the situation?

Solution

The C behavior for pointer arithmetic is defined in terms of the object that the pointer points to. It has nothing to do with the identifier that names the object.

In C, what people commonly call a variable is an identifier combined with an object. int c = 0; defines a variable with the name (identifier) c and memory for an int (an object). However, you can have objects with no identifiers (malloc provides memory in which you can create objects, using pointers), and you can have identifiers that do not refer to objects (a name might refer to a type or a function or something else).

The rules for pointer arithmetic, in C 2024 6.5.7, are entirely defined in terms of the pointed-to object (an array element and the array it is in). In this code:

int *p = malloc(10 * sizeof *p);
for (int i = 0; i < 10; ++i)
    p[i] = i*i;
int *q = p + 3;

the p + 3 is defined because p points to an element in an array of 10 int, even though that array has no name. (p points into the array, but there is no identifier for the array itself).

If one translation unit receives a pointer from another translation unit, all that matters for pointer arithmetic is whether the pointed-to object satisfies the requirements of pointer arithmetic. Whether names are known or even exist is irrelevant.

Further, the rules for pointer arithmetic say nothing about whether any information about the objects is present in the translation unit containing the arithmetic. If one translation unit creates an object, and another translation unit performs defined arithmetic on pointers related to that object, the C implementation must make the arithmetic work.