Assume that we want to store different kinds of integers using tagged union
typedef enum NumberType {
TYPE_INT_8,
TYPE_INT_16,
} NumberType;
typedef union {
int8_t int8;
int16_t int16;
} IntValue;
typedef struct {
NumberType number_type;
IntValue value;
} Int;
But later we would like to define Number
in certain cases, which includes float
. Is it ok to "extend" the Int
type by embedding Int
in a new union, and the new struct can be fed into functions that accepts Int*
?
typedef enum NumberType {
TYPE_INT_8,
TYPE_INT_16,
TYPE_FLOAT, // <------ Newly added
} NumberType;
typedef union {
int8_t int8;
int16_t int16;
} IntValue;
typedef struct {
NumberType number_type;
IntValue value;
} Int;
typedef union {
IntValue i; // <------ Embeded union
float f;
} NumberValue;
typedef struct {
NumberType number_type;
NumberValue value;
} Number;
int8_t masked_int8_value(*Int i);
Number n;
masked_int8_value((Int*)&n);
If not, how can we extend the old type so that we don't waste another header for the tag in the new type?
Is it ok to "extend" the
Int
type by embeddingInt
in a new union, and the new struct can be fed into functions that acceptsInt*
?
What you want to do looks more like introducing a new type for Int
to be a specialization of. Code working with your original Int
type is entitled to assume that it has a value of type IntValue
, but that's not necessarily the case for an object of your proposed Number
type, so Number
is not an extension of Int
in the Liskov sense.
In any case, it depends on what you mean by "ok", and on exactly how you do what you describe. The particular approach in your example is problematic.
enum NumberType
if you define enum NumberType
differently in different translation units then the resulting types are not compatible with each other, in the C language's sense of that term, notwithstanding that they have the same tag. Nor does defining typedef
aliases for those types make any difference in this regard.
The above is viral in the sense that structure and union types that have members of type enum NumberType
, including when that is designated via a typedef
alias, cannot be compatible if the enum Numbertype
types they rely upon are not compatible. Again, notwithstanding that they may have the same tag or both be without tags.
And that extends to different declarations of the same function that are written in terms of such resulting incomaptible enum
, union
, or struct
types or types derived from them. Calling a function at a point where its in-scope declaration is not compatible with its definition produces undefined behavior.
Thus, you can define your enum NumberType
differently in different translation units as long as those definitions and all other definitions that rely on them are siloed into disjoint sets of translation units, and such that no function defined in terms of any of those is called from any of the translation units in a different silo. In practice, your compiler is likely to reject mixtures of the two sets of incompatible type definitions. It might not recognize function calls through incompatible declarations, but that's worse because such calls have UB.
To the extent that all these particular issues spring from different translation units having incompatible definitions a same-named types, a reliable way to avoid them would be to ensure that there are no such variations among the sources contributing to any particular program. If your "changing" means not doing so then it will produce a coding and maintenance burden that I would not expect to be acceptable.
IntValue
union in another unionEmbedding one union in another is not itself problematic.
There is nothing inherently wrong with a union containing a member whose type is also a union type.
Given a pointer, of any pointer type, that points to a union object, it is valid to convert that pointer to a pointer to a type compatible with that of one or more of the union members, and to read the value of the pointed-to object. If the value last stored in the union was that of one of the compatibly-typed members then the read will observe that value. Including when the type in question is a union type itself.
Thus union types can be extended in this sense. Subject to somewhat more restrictions, structure types can be extended in a similar way.
You may freely convert pointer values among pointer-to-structure types. However, accessing an object of type T via a pointer to a type incompatible with T violates the strict aliasing rule in most cases, including all cases where the two types are incompatible structure types.* Strict-aliasing violations produce UB.
Even if you ensure that all translation units have compatible definitions of enum NumberType
, the Int
and Number
types in your example are not compatible with each other. Inducing a function to access a Number
via a pointer to Int
therefore produces undefined behavior. Even if it seems to produce the expected result, it is not safe to rely on the program to do so every time, or when built by a different compiler, or when built with different compilation options, or when built for a different machine architecture. Nor is it safe to assume that there are no unwanted side effects. This is a technical deficit that I would not be willing to accept.
how can we extend the old type so that we don't waste another header for the tag in the new type?
Headers are cheap. I guess it's possible to go overboard with isolating declarations in different headers, but generally speaking, headers are not a resource worth conserving.
Additionally, C is not an object-oriented language. It is still possible to apply a variety of object-oriented programming concepts and practices in C, but type extension in a sense that provides for a degree of polymorphism is tricky. It is a lot easier to accomplish if designed from the beginning. The main alternatives are these:
Sometimes it makes sense to use a union of the types you want to handle polymorphically as a psuedo-supertype. In such cases, it is often convenient to choose a discriminated union, so that you can maintain type information within, but that does depend on the types involved being amenable. For example:
enum NumberType {
TYPE_INT_8,
TYPE_INT_16,
TYPE_FLOAT
};
union IntValue {
int8_t int8;
int16_t int16;
};
struct Int {
enum NumberType number_type;
union IntValue value;
};
struct Float {
NumberType number_type;
float value;
};
union Number {
struct {
enum NumberType number_type;
};
struct Int as_int;
struct Float as_float;
};
void use_an_int(struct Int *i);
void foo() {
union Number num = { .number_type = TYPE_INT_8, .as_int.value.int8 = 42 };
use_an_int((struct Int *) &num);
}
You can emulate single-inheritance supertype relationships with structures by embedding the supertype structure as the initial member of each of its direct subtypes. For example:
enum NumberType {
TYPE_INT_8,
TYPE_INT_16,
TYPE_FLOAT
};
struct Number {
enum NumberType number_type;
};
union IntValue {
int8_t int8;
int16_t int16;
};
struct Int {
struct Number super;
union IntValue value;
}
struct Float {
struct Number super;
float value;
};
void use_an_int(struct Int *i);
void use_a_float(struct Float *f);
void use_a_Number(struct Number *num) {
switch (num->number_type) {
case TYPE_FLOAT:
use_a_float((struct Float *)num);
break;
case TYPE_INT8:
case TYPE_INT16:
use_an_int((struct Int *)num);
break;
default:
abort();
}
}
Obviously, this variation would be more invasive with respect to existing types to which you want to grant a notional supertype.
*There is a nuance here involving pointers to structures and pointers to their first members, but that does not apply to your situation.