I'm currently limited to coding in C and I want to do C object oriented programming.
One thing that comes to mind is how to correctly downcast a type in C without violating strict aliasing.
Imagine I have an animal struct with a vtable (meaning a struct of function pointers) and a dog like this:
typedef void (*sound_func)(const animal_t *animal);
struct animal_vtable {
sound_func sound;
};
typedef struct animal_vtable animal_vtable_t;
typedef struct animal {
animal_vtable_t * vtable;
int size;
} animal_t;
typedef struct dog {
animal_t animal;
} dog_t;
There will be cases when I want to know whether my animal is a dog, this is how I currently think of making an animal instance a dog, but I'm unsure if this will trigger undefined behavior or not.
dog_t *to_dog(animal_t *a) {
if (a->vtable != &dog_table) {
return NULL;
}
size_t offset = offsetof(dog_t, animal);
uintptr_t animal_offset = (uintptr_t) a;
return (dog_t *) (animal_offset - offset);
}
The key part here is that both the memory of dog_t *
and animal_t *
are on the same memory location for obvious reasons, but will this be a problem for optimizers? Currently I have -fno-strict-aliasing
enabled and thus I know it works, but is it safe to turn that off?
Below is the full working example which does not trigger errors when compiled with address and unefined behavior sanitizers.
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
/*
* Animal section
*/
struct animal_vtable;
typedef struct animal_vtable animal_vtable_t;
typedef struct animal {
animal_vtable_t * vtable;
int size;
} animal_t;
typedef void (*sound_func)(const animal_t *animal);
struct animal_vtable {
sound_func sound;
};
void animal_sound(const animal_t* animal) {
animal->vtable->sound(animal);
}
int animal_size(const animal_t* animal) {
return animal->size;
}
/*
* dog section
*/
void dog_bark(const animal_t *animal);
static animal_vtable_t dog_table = {
.sound = dog_bark
};
typedef struct dog {
animal_t animal;
} dog_t;
dog_t* make_dog(int size) {
dog_t* dog = malloc(sizeof(dog_t));
if (dog == NULL) {
return dog;
}
dog->animal = (animal_t) { .vtable = &dog_table, .size = size };
return dog;
}
void dog_bark(const animal_t *animal) {
printf("wuff!\n");
}
dog_t *to_dog(animal_t *a) {
if (a->vtable != &dog_table) {
return NULL;
}
size_t offset = offsetof(dog_t, animal);
uintptr_t animal_offset = (uintptr_t) a;
return (dog_t *) animal_offset - offset;
}
/*
* main tests
*/
int main(int argc, char** argv) {
dog_t *dog = make_dog(10);
if (dog == NULL) {
exit(-1);
}
animal_t *animal = &(dog->animal);
animal_sound(animal);
dog_t *d2 = to_dog(animal);
printf("dog addr: %p, d2 addr: %p\n", dog, d2);
printf("dog size: %d\n", animal_size(&d2->animal));
printf("dog size: %d\n", animal_size(&dog->animal));
free(dog);
}
I'm unsure if this will trigger undefined behavior or not.
dog_t *to_dog(animal_t *a) { if (a->vtable != &dog_table) { return NULL; } size_t offset = offsetof(dog_t, animal); uintptr_t animal_offset = (uintptr_t) a; return (dog_t *) animal_offset - offset; }
The expression (dog_t *) animal_offset - offset
does not mean what you think it means. It is equivalent to ((dog_t *) animal_offset) - offset
, whereas what you appear to want is (dog_t *) (animal_offset - offset)
(and these are different).
But more generally, you are making it harder than it needs to be. Supposing that you implement inheritance as you seem inclined to do, by making the first member of the child type an instance of the parent type, you can perform the kind of pointer conversion you demonstrate via a simple cast: (dog_t *) a
. The language specification guarantees that this is valid under the conditions described, supposing that a
is in fact a pointer to the animal
member of a dog_t
. This is specified in C17, paragraph 6.7.2.1/15 (emphasis added):
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
Substantially the same wording appears in earlier versions of the standard, too.
As for
will this be a problem for optimizers? Currently I have -fno-strict-aliasing enabled and thus I know it works, but is it safe to turn that off?
It should not be a problem for optimizers, provided that the definition of dog_t
is visible in the translation unit. In that case, optimizers that are not deeply broken will know that pointers to dog_t
and pointers to animal_t
can alias each other.
However, the definition of dog_t
being visible is a requirement for use of offsetof
, but not a requirement for the pointer cast, so that may be something to watch out for. Also, it's not just this code where you need to watch out for aliasing issues. For safety relative to strict aliasing, every function that accesses pointers to both types will need to have the definition of dog_t
visible.