cdiscriminated-union

Reuse Union to Define an Extended Union in C


Assume that we want to store different kinds of integers using tagged union

typedef enum NumberType {
    TYPE_INT_8,
    TYPE_INT_16,
} NumberType;

typedef union {
  int8_t int8;
  int16_t int16;
} IntValue;

typedef struct {
  NumberType number_type;
  IntValue value;
} Int;

But later we would like to define Number in certain cases, which includes float. Is it ok to "extend" the Int type by embedding Int in a new union, and the new struct can be fed into functions that accepts Int*?

typedef enum NumberType {
    TYPE_INT_8,
    TYPE_INT_16,
    TYPE_FLOAT,    // <------ Newly added
} NumberType;

typedef union {
  int8_t int8;
  int16_t int16;
} IntValue;

typedef struct {
  NumberType number_type;
  IntValue value;
} Int;

typedef union {
  IntValue i;      // <------ Embeded union
  float f;
} NumberValue;

typedef struct {
  NumberType number_type;
  NumberValue value;
} Number;

int8_t masked_int8_value(*Int i);

Number n;

masked_int8_value((Int*)&n);

If not, how can we extend the old type so that we don't waste another header for the tag in the new type?


Solution

  • Is it ok to "extend" the Int type by embedding Int in a new union, and the new struct can be fed into functions that accepts Int*?

    What you want to do looks more like introducing a new type for Int to be a specialization of. Code working with your original Int type is entitled to assume that it has a value of type IntValue, but that's not necessarily the case for an object of your proposed Number type, so Number is not an extension of Int in the Liskov sense.

    In any case, it depends on what you mean by "ok", and on exactly how you do what you describe. The particular approach in your example is problematic.

    Changing enum NumberType

    Thus, you can define your enum NumberType differently in different translation units as long as those definitions and all other definitions that rely on them are siloed into disjoint sets of translation units, and such that no function defined in terms of any of those is called from any of the translation units in a different silo. In practice, your compiler is likely to reject mixtures of the two sets of incompatible type definitions. It might not recognize function calls through incompatible declarations, but that's worse because such calls have UB.

    To the extent that all these particular issues spring from different translation units having incompatible definitions a same-named types, a reliable way to avoid them would be to ensure that there are no such variations among the sources contributing to any particular program. If your "changing" means not doing so then it will produce a coding and maintenance burden that I would not expect to be acceptable.

    Embedding your IntValue union in another union

    Embedding one union in another is not itself problematic.

    Accessing a struct of one type via a pointer to a different structure type

    You may freely convert pointer values among pointer-to-structure types. However, accessing an object of type T via a pointer to a type incompatible with T violates the strict aliasing rule in most cases, including all cases where the two types are incompatible structure types.* Strict-aliasing violations produce UB.

    Even if you ensure that all translation units have compatible definitions of enum NumberType, the Int and Number types in your example are not compatible with each other. Inducing a function to access a Number via a pointer to Int therefore produces undefined behavior. Even if it seems to produce the expected result, it is not safe to rely on the program to do so every time, or when built by a different compiler, or when built with different compilation options, or when built for a different machine architecture. Nor is it safe to assume that there are no unwanted side effects. This is a technical deficit that I would not be willing to accept.


    how can we extend the old type so that we don't waste another header for the tag in the new type?

    Headers are cheap. I guess it's possible to go overboard with isolating declarations in different headers, but generally speaking, headers are not a resource worth conserving.

    Additionally, C is not an object-oriented language. It is still possible to apply a variety of object-oriented programming concepts and practices in C, but type extension in a sense that provides for a degree of polymorphism is tricky. It is a lot easier to accomplish if designed from the beginning. The main alternatives are these:

    Unions of structures

    Sometimes it makes sense to use a union of the types you want to handle polymorphically as a psuedo-supertype. In such cases, it is often convenient to choose a discriminated union, so that you can maintain type information within, but that does depend on the types involved being amenable. For example:

    enum NumberType {
        TYPE_INT_8,
        TYPE_INT_16,
        TYPE_FLOAT
    };
    
    union IntValue {
      int8_t int8;
      int16_t int16;
    };
    
    struct Int {
      enum NumberType number_type;
      union IntValue value;
    };
    
    struct Float {
      NumberType number_type;
      float value;
    };
    
    union Number {
      struct {
        enum NumberType number_type;
      };
      struct Int as_int;
      struct Float as_float;
    };
    
    void use_an_int(struct Int *i);
    
    void foo() {
        union Number num = { .number_type = TYPE_INT_8, .as_int.value.int8 = 42 };
        use_an_int((struct Int *) &num);
    }
    

    Nested structures

    You can emulate single-inheritance supertype relationships with structures by embedding the supertype structure as the initial member of each of its direct subtypes. For example:

    enum NumberType {
        TYPE_INT_8,
        TYPE_INT_16,
        TYPE_FLOAT
    };
    
    struct Number {
      enum NumberType number_type;
    };
    
    union IntValue {
      int8_t int8;
      int16_t int16;
    };
    
    struct Int {
      struct Number super;
      union IntValue value;
    }
    
    struct Float {
      struct Number super;
      float value;
    };
    
    
    void use_an_int(struct Int *i);
    void use_a_float(struct Float *f);
    
    void use_a_Number(struct Number *num) {
        switch (num->number_type) {
            case TYPE_FLOAT:
                use_a_float((struct Float *)num);
                break;
            case TYPE_INT8:
            case TYPE_INT16:
                use_an_int((struct Int *)num);
                break;
            default:
                abort();
        }
    }
    

    Obviously, this variation would be more invasive with respect to existing types to which you want to grant a notional supertype.


    *There is a nuance here involving pointers to structures and pointers to their first members, but that does not apply to your situation.