cterminologyoffsetof

C - Reference after dereference terminology


This question is about terminology.

int main()
{
    unsigned char array[10] = {0};

    void *ptr = array;

    void *middle = &ptr[5]; // <== dereferencing ‘void *’ pointer
}

Gcc emits the warning Dereferencing void pointer.

I understand the warning because the compiler needs to compute the actual offset, and it couldn't because void has no standard size.

But I disagree with the error message. This is not a dereference. I can't find a dereference explanation where it is something else than taking value of something.

Same thing for offsetof:

#define offsetof(a,b) ((int)(&(((a*)(0))->b)))

There are lot of threads about whether this is UB because of a null pointer dereference. But this is not a null pointer dereference! Is it?

There is no storage access in the assembly code

mov rax, QWORD PTR [rbp-48]
add rax, 5
mov QWORD PTR [rbp-40], rax

What is the difference between dereference and storage access?


Solution

  • But I disagree with the error message. This is not a dereference. I can't find a dereference explanation where it is something else than taking value of something.

    The standard does not provide a formal definition of the term "dereference". The only place it uses it at all is in (non-normative) footnote 102:

    [...] Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime.

    Note well, however, that this note characterizes dereferencing as the behavior of the unary * operator, not the effect of performing some other operation on the result. You can think of the operation as converting a pointer into the object to which it points, which you will recognize presents an issue if the pointer does not, in fact, point to an object of the pointed-to type, or if the pointed-to type is an incomplete one such as void. Such an issue exists formally even if the resulting object goes unused.

    Now I acknowledge that there is room for confusion here on account of the fact that it is useless to perform a dereference without using the resulting object, but that's beside the point. Consider the following complete C statement:

    1 + 2;
    

    Would you deny that it performs an addition just because the result is unused?

    Now, your (sub-)expression ptr[5] is defined to have meaning identical to that of (*((ptr)+(5))). The type of a pointer addition expression is the same as the type of the pointer involved, so the that indeed does involved dereferencing a void *, in the sense of applying the unary * operator to an expression of that type.

    Nevertheless, although I think the error message is correct, I do agree that it is a poor choice. A more fundamental problem here, and one that is reached first in evaluation order, is a violation of the language constraint that in pointer addition, the pointer must point to a complete type, which void is not. Indeed, it's hard to construe the message that is emitted as satisfying the requirement that constraint violations result in a diagnostic. It seems to be about a different problem -- one that produces undefined behavior, but does not involve a constraint violation.

    You also remark:

    Same thing for offsetof:

    #define offsetof(a,b) ((int)(&(((a*)(0))->b)))
    

    [...] But this is not a null pointer dereference! Is it?

    Be careful, there. The C language does not define the specific form of the replacement text of the offsetof() macro; what you've presented is an implementation detail.

    We could easily divert into semantics here, since "dereference" is not a defined term in the standard, so I'll address instead a similar question: when the macro arguments meet the requirements of the offsetof() macro, does the definition presented expand to an expression with well-defined behavior?

    The standard does not define behavior for the indirect member selection operator (->) when its left-hand operand has an acceptable type but does not point to any object (such as when it is null). The behavior is therefore undefined. Or if we take a->b to be wholly equivalent to ((*a).b), then the behavior is explicitly undefined when a does not point to any object. Either way, the C language does not define behavior for the expression.

    But this is where it becomes important that your particular macro definition is an implementation detail. The implementation from which it is drawn is free to provide whatever behavior it wishes, and in particular, it can provide behavior that reliably satisfies C's specifications for the offsetof() macro. You should not rely on such code yourself. Even on an implementation that provides an offsetof() definition of that form, you cannot be certain that it does not also employ some special internal magic -- not available directly to your own code -- to make it work.