phplanguage-designphp-internals

Why don't PHP attributes allow functions?


I'm pretty new to PHP, but I've been programming in similar languages for years. I was flummoxed by the following:

class Foo {
    public $path = array(
        realpath(".")
    );
}

It produced a syntax error: Parse error: syntax error, unexpected '(', expecting ')' in test.php on line 5 which is the realpath call.

But this works fine:

$path = array(
    realpath(".")
);

After banging my head against this for a while, I was told you can't call functions in an attribute default; you have to do it in __construct. My question is: why?! Is this a "feature" or sloppy implementation? What's the rationale?


Solution

  • The compiler code suggests that this is by design, though I don't know what the official reasoning behind that is. I'm also not sure how much effort it would take to reliably implement this functionality, but there are definitely some limitations in the way that things are currently done.

    Though my knowledge of the PHP compiler isn't extensive, I'm going try and illustrate what I believe goes on so that you can see where there is an issue. Your code sample makes a good candidate for this process, so we'll be using that:

    class Foo {
        public $path = array(
            realpath(".")
        );
    }
    

    As you're well aware, this causes a syntax error. This is a result of the PHP grammar, which makes the following relevant definition:

    class_variable_declaration: 
          //...
          | T_VARIABLE '=' static_scalar //...
    ;
    

    So, when defining the values of variables such as $path, the expected value must match the definition of a static scalar. Unsurprisingly, this is somewhat of a misnomer given that the definition of a static scalar also includes array types whose values are also static scalars:

    static_scalar: /* compile-time evaluated scalars */
          //...
          | T_ARRAY '(' static_array_pair_list ')' // ...
          //...
    ;
    

    Let's assume for a second that the grammar was different, and the noted line in the class variable delcaration rule looked something more like the following which would match your code sample (despite breaking otherwise valid assignments):

    class_variable_declaration: 
          //...
          | T_VARIABLE '=' T_ARRAY '(' array_pair_list ')' // ...
    ;
    

    After recompiling PHP, the sample script would no longer fail with that syntax error. Instead, it would fail with the compile time error "Invalid binding type". Since the code is now valid based on the grammar, this indicates that there actually is something specific in the design of the compiler that's causing trouble. To figure out what that is, let's revert to the original grammar for a moment and imagine that the code sample had a valid assignment of $path = array( 2 );.

    Using the grammar as a guide, it's possible to walk through the actions invoked in the compiler code when parsing this code sample. I've left some less important parts out, but the process looks something like this:

    // ...
    // Begins the class declaration
    zend_do_begin_class_declaration(znode, "Foo", znode);
        // Set some modifiers on the current znode...
        // ...
        // Create the array
        array_init(znode);
        // Add the value we specified
        zend_do_add_static_array_element(znode, NULL, 2);
        // Declare the property as a member of the class
        zend_do_declare_property('$path', znode);
    // End the class declaration
    zend_do_end_class_declaration(znode, "Foo");
    // ...
    zend_do_early_binding();
    // ...
    zend_do_end_compilation();
    

    While the compiler does a lot in these various methods, it's important to note a few things.

    1. A call to zend_do_begin_class_declaration() results in a call to get_next_op(). This means that it adds a new opcode to the current opcode array.
    2. array_init() and zend_do_add_static_array_element() do not generate new opcodes. Instead, the array is immediately created and added to the current class' properties table. Method declarations work in a similar way, via a special case in zend_do_begin_function_declaration().
    3. zend_do_early_binding() consumes the last opcode on the current opcode array, checking for one of the following types before setting it to a NOP:
      • ZEND_DECLARE_FUNCTION
      • ZEND_DECLARE_CLASS
      • ZEND_DECLARE_INHERITED_CLASS
      • ZEND_VERIFY_ABSTRACT_CLASS
      • ZEND_ADD_INTERFACE

    Note that in the last case, if the opcode type is not one of the expected types, an error is thrown – The "Invalid binding type" error. From this, we can tell that allowing the non-static values to be assigned somehow causes the last opcode to be something other than expected. So, what happens when we use a non-static array with the modified grammar?

    Instead of calling array_init(), the compiler prepares the arguments and calls zend_do_init_array(). This in turn calls get_next_op() and adds a new INIT_ARRAY opcode, producing something like the following:

    DECLARE_CLASS   'Foo'
    SEND_VAL        '.'
    DO_FCALL        'realpath'
    INIT_ARRAY
    

    Herein lies the root of the problem. By adding these opcodes, zend_do_early_binding() gets an unexpected input and throws an exception. As the process of early binding class and function definitions seems fairly integral to the PHP compilation process, it can't just be ignored (though the DECLARE_CLASS production/consumption is kind of messy). Likewise, it's not practical to try and evaluate these additional opcodes inline (you can't be sure that a given function or class has been resolved yet), so there's no way to avoid generating the opcodes.

    A potential solution would be to build a new opcode array that was scoped to the class variable declaration, similar to how method definitions are handled. The problem with doing that is deciding when to evaluate such a run-once sequence. Would it be done when the file containing the class is loaded, when the property is first accessed, or when an object of that type is constructed?

    As you've pointed out, other dynamic languages have found a way to handle this scenario, so it's not impossible to make that decision and get it to work. From what I can tell though, doing so in the case of PHP wouldn't be a one-line fix, and the language designers seem to have decided that it wasn't something worth including at this point.