cstructlanguage-lawyeroffsetof

Using offsetof to access struct member


I have the following code:

#include <stddef.h>

int main() {
  struct X {
    int a;
    int b;
  } x = {0, 0};

  void *ptr = (char*)&x + offsetof(struct X, b);

  *(int*)ptr = 42;

  return 0;
}

The last line performs indirect access to x.b.

Is this code defined according to any of C standards?

I know that:

I guess that accessing data pointed by ptr via int*does not violate strict aliasing rule but I'm not fully sure that the standard guarantees that.


Solution

  • Yes, this is perfectly well defined, and is exactly how offsetof is intended to be used. You do the pointer arithmetic on a pointer to character type, so that it is done in bytes, and then cast back to the actual type of the member.

    You can see for instance 6.3.2.3 p7 (all references are to C17 draft N2176):

    When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

    So (char *)&x is a pointer to x converted to a pointer to char, therefore it points to the lowest addressed byte of x. When we add offsetof(struct X, b) (say it's 4) then we have a pointer to byte 4 of x. Now offsetof(struct X, b) is defined to return

    the offset in bytes, to the structure member, from the beginning of its structure [7.19p3]

    so 4 is in fact the offset from the beginning of x to x.b. Hence byte 4 of x is the lowest byte of x.b, and that's what ptr points to; in other words, ptr is a pointer to x.b, but of type char *. When we cast it back to int *, we have a pointer to x.b which is of the type int *, exactly the same as we would get from the expression &x.b. So dereferencing this pointer accesses x.b.


    A question arose in the comments about this last step: when ptr is cast back to int *, how do we know we indeed have a pointer to the int x.b? This is a bit less explicit in the standard but I think it is the obvious intent.

    However, I think we can also derive it indirectly. Hopefully we agree that ptr above is a pointer to the lowest addressed byte of x.b. Now by the same passage of 6.3.2.3 p7 quoted above, taking a pointer to x.b and converting it to char *, as in (char *)&x.b, would also yield a pointer to the lowest addressed byte of x.b. As they are pointers of the same type which point to the same byte, they are the same pointer: ptr == (char *)&x.b.

    Then we look at the preceding sentences of 6.3.2.3 p7:

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.

    There are no problems with alignment here, because char has the weakest alignment requirement (6.2.8 p6). So converting (char *)&x.b back to int * must recover a pointer to x.b, i.e. (int *)(char *)&x.b == &x.b.

    But ptr is the same pointer as (char *)&x.b, so we may substitute them in this equality: (int *)ptr == &x.b.

    Obviously *&x.b produces an lvalue designating x.b (6.5.3.2 p4), hence so does *(int *)ptr.


    There is no problem with strict aliasing (6.5p7). First, determine the effective type of x.b using 6.5p6:

    The effective type of an object for an access to its stored value is the declared type of the object, if any. [Then explanations on what to do if it doesn't have a declared type.]

    Well, x.b does have a declared type, which is int. So its effective type is int.

    Now to see if the access is legal under strict aliasing, see 6.5p7:

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

    ā€” a type compatible with the effective type of the object,

    [more options not relevant here]

    We are accessing x.b through the lvalue expression *(int *)ptr, which has type int. And int is compatible with int per 6.2.7p1:

    Two types have compatible type if their types are the same. [Then other conditions under which they may also be compatible].


    An example of this same technique that maybe is more familiar is indexing into an array by bytes. If we have

    int arr[100];
    *(int *)((char *)arr + (17 * sizeof(int))) = 42;
    

    then this is equivalent to arr[17] = 42;.

    This is how generic routines like qsort and bsearch are implemented. If we try to qsort an array of int, then within qsort all the pointer arithmetic is done in bytes, on pointers to character type with the offsets manually scaled by the object size passed as an argument (which here would be sizeof(int)). When qsort needs to compare two objects, it casts them to const void * and passes them as arguments to the comparator function, which casts them back to const int * to do the comparison.

    This all works fine and is clearly an intended feature of the language. So I think we needn't doubt that the use of offsetof in the current question is similarly an intended feature.