clanguage-lawyerundefined-behaviorchar-pointeroffsetof

Is accessing members through offsetof well defined?


When doing pointer arithmetic with offsetof, is it well defined behavior to take the address of a struct, add the offset of a member to it, and then dereference that address to get to the underlying member?

Consider the following example:

#include <stddef.h>
#include <stdio.h>

typedef struct {
    const char* a;
    const char* b;
} A;

int main() {
    A test[3] = {
        {.a = "Hello", .b = "there."},
        {.a = "How are", .b = "you?"},
        {.a = "I\'m", .b = "fine."}};

    for (size_t i = 0; i < 3; ++i) {
        char* ptr = (char*) &test[i];
        ptr += offsetof(A, b);
        printf("%s\n", *(char**)ptr);
    }
}

This should print "there.", "you?" and "fine." on three consecutive lines, which it currently does with both clang and gcc, as you can verify yourself on wandbox. However, I am unsure whether any of these pointer casts and arithmetic violate some rule which would cause the behavior to become undefined.


Solution

  • As far as I can tell, it is well-defined behavior. But only because you access the data through a char type. If you had used some other pointer type to access the struct, it would have been a "strict aliasing violation".

    Strictly speaking, it is not well-defined to access an array out-of-bounds, but it is well-defined to use a character type pointer to grab any byte out of a struct. By using offsetof you guarantee that this byte is not a padding byte (which could have meant that you would get an indeterminate value).

    Note however, that casting away the const qualifier does result in poorly-defined behavior.

    EDIT

    Similarly, the cast (char**)ptr is an invalid pointer conversion - this alone is undefined behavior as it violates strict aliasing. The variable ptr itself was declared as a char*, so you can't lie to the compiler and say "hey, this is actually a char**", because it is not. This is regardless of what ptr points at.

    I believe that the correct code with no poorly-defined behavior would be this:

    #include <stddef.h>
    #include <stdio.h>
    #include <string.h>
    
    typedef struct {
        const char* a;
        const char* b;
    } A;
    
    int main() {
        A test[3] = {
            {.a = "Hello", .b = "there."},
            {.a = "How are", .b = "you?"},
            {.a = "I\'m", .b = "fine."}};
    
        for (size_t i = 0; i < 3; ++i) {
            const char* ptr = (const char*) &test[i];
            ptr += offsetof(A, b);
    
            /* Extract the const char* from the address that ptr points at,
               and store it inside ptr itself: */
            memmove(&ptr, ptr, sizeof(const char*)); 
            printf("%s\n", ptr);
        }
    }