clanguage-lawyersizeofvariable-length-arraycompiler-bug

What are the exact conditions under which type_name in sizeof(type_name) is evaluated? GCC evaluates f() in sizeof(int [(f(), 100)])


Context

The standard says (C17, 6.5.3.4 ¶2):

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Confusingly, the wording doesn't distinguish between sizeof's two syntactic contexts:

I believe that type-name technically doesn't "have" a type, but rather "denotes" (names, specifies) one. Also, in the case of a type name as the argument, I would intuitively have phrased this in terms of evaluating all assignment-expressions within it – if not for accuracy, then for clarity.

In any event, I understand this wording to mean that in the case of sizeof(type), all expressions within type are evaluated exactly if (ie: if and only if) type denotes a VLA (variable-length array) type.

Problem

With the above in mind, the following code's last printf statement has surprising results:

#include <stdio.h>

void f(int i) {
    printf("side effect %d; ", i);
}

int main(void) {
    int n = 9;
    int a[n];
    printf("%zu\n", sizeof a);                   // 36
    printf("%zu\n", sizeof a[n++]);              // 4
    printf("%d\n", n);                           // 9
    printf("%zu\n", sizeof(int [n++]));          // 36
    printf("%d\n", n);                           // 10
    printf("%zu\n", sizeof(int [(f(0), 100)]));
      // side effect 0; 400 (type of operand: int, a non-VLA type)

    return 0;
}

(The argument int i of f – omitted in the question title – is intended for debugging purposes and for playing around, if one wants to distinguish different calls to that function.)

Incidentally, sizeof(int [f(0), 100]) (without parentheses surrounding the comma expression denoting the array size) leads to an error, which I discuss in the following question: Why must a comma expression used as an array size be enclosed in parentheses if part of an array declarator?

Other references

Relevant places in the standard (C17 draft) for where the syntax of array declarators is discussed include: 6.7.7 ¶1, 6.7.6.2 ¶3.

This answer by Keith Thompson to a question about VLAs is relevant. But note that my question is not about VLAs per se (even though it uses them in the code above for contrastive purposes).


Solution

  • According to the C17 Standard (6.5.3.4 The sizeof and _Alignof operators)

    2 The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

    and (6.7.6.2 Array declarators)

    4 If the size is not present, the array type is an incomplete type. If the size is * instead of being an expression, the array type is a variable length array type of unspecified size, which can only be used in declarations or type names with function prototype scope;146) such arrays are nonetheless complete types. If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type. (Variable length arrays are a conditional feature that implementations need not support; see 6.10.8.3.)

    And at last (6.6 Constant expressions):

    2 A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.

    and

    3 Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.

    As in this expression

    sizeof(int [(f(0), 100)])
    

    the sub-expression (f(0), 100) with the comma operator is not a constant sub-expression in the array declaration then there is declared a variable length array the size of which is evaluated at runtime.

    Thus in all these calls of printf

    printf("%zu\n", sizeof a);                   // 36
    printf("%zu\n", sizeof(int [n++]));          // 36
    printf("%zu\n", sizeof(int [(f(0), 100)]));
    

    there are used variable length arrays. Their sizes can be determinated at runtime.

    On the other hand, if you will write for example

    printf("%zu\n", sizeof( (f(0), 100)));
    

    then the function f() will not be called because in this case the comma operator is a subexpression of the constant expression with the sizeof operator.

    Shortly speaking if in an array declaration the size of the array is not specified as a constant integer expression (and the comma operator is not a constant integer expression according to the quote above) then the array is a variable length array.