c pointers language-lawyer c99 restrict-qualifier

Do C compilers follow the "formal definition of `restrict`"?

Consider this code:

extern int A[2];

/* Just returns `p` back. */
extern int *identity(int *p);

int f(int *restrict p)
{
    int *q = identity(p);  /* `q` becomes "based on" `p` */
    int *r = A + (p == q);
    *p = 1;
    *r = 2;
    return *p;
}

Is r "based on" p here? According to the standard, it seems like it must be. In particular:

[...] a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E.

Where P is:

a restrict-qualified pointer to type T.

Given the above definition, r is based on p, since "modifying p would change the value of A + (p == q)" (hence r's). Even though r will definitely point inside the A array, it still must be based on p, counter-intuitively. Perhaps this is where I'm wrong.

GCC and Clang do not think the above is true. They optimize the last load on p to just return 1;. So, f(&A[1]) would incorrectly return 1, instead of 2.

Are the implementors misinterpreting the standard? If not, then f(&A[1]) is undefined behavior, but why so, what am I missing?

Thanks!

Solution

The formal definition of restrict is problematic. The wording conveys the spirit of the idea moderately well, but if you take it as the formal definition it claims to be then it doesn't really serve (what I take to be) the intended purpose.

Is r "based on" p here?

Yes, according to the formal definition, provided that identity(p) returns the value of p, as its name suggests. You already quoted the relevant text. In that case, modifying p after the initializer for q is evaluated and before the initializer for r is evaluated would change the value of r.

Remember, though, that restrict is about aliasing, and "based on" is about determining which expressions must be considered as possible aliases of each other, so as to not have to account for possible aliasing of restrict-qualified pointers with expressions not based on them. The spec does not exclude "based on" status attaching via the equality comparison in your example, but that should be considered a flaw in the spec.

GCC and Clang do not think the above is true. They optimize the last load on p to just return 1;. So, f(&A[1]) would incorrectly return 1, instead of 2.

Are the implementors misinterpreting the standard?

Yes and no. They are implementing what I take to be the intent of the standard, which is that restrict implies that they can assume that p does not alias any members of A.

But they are not implementing the letter of the spec:

let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: [...] Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause.

(C17 6.7.3.1/4)

If we accept, per the actual wording of the spec, that r is based on p, then the implementation is obliged to act as if the assignment to *r modifies p, which is to say that it cannot assume that *p evaluates to the same value after the assignment to *r that it did before.

If not, then f(&A[1]) is undefined behavior, but why so, what am I missing?

I'm confident that f(&A[1]) is intended to have undefined behavior, because it is intended that r should not be based on p in your example. If indeed it were not, then the assignment to *r would violate "Every other lvalue used to access the value of X shall also have its address based on P."