phparraysreturn-by-reference

PHP return arrays by reference, but also be nullable


I am a huge fan of JavaScript's Array.prototype.find(). I like being able to just take an array, throw it a predicate, and get the first element in that array that matches the predicate. I want that functionality in PHP in the form of a helper function.

Here is what I came up with:

function find(array &$arr, callable $callable)
{
    foreach ($arr as &$x) {
        if ($callable($x)) {
            return $x;
        }
    }
    return null;
}

This works great. Mostly. I have trouble with getting the exact behavior I want regarding references. Namely, if the returned element is a complex structure (i.e., it is an object or an array), and I modify the contents of that element's structure by calling instance methods or keys on the returned value, the changes should immediately reflect in the array I pulled it from. That's JavaScript's native behavior, and I would prefer my PHP port fully emulated this.

If the array being iterated is full of objects, this works flawlessly. PHP passes references to objects by default, so this is not a problem.

My problem arises when the iterated array is full of nested sub-arrays. If I get a sub-array value returned from this function, it's passed by value and copied. If the caller modifies that copy of the found sub-array, changes are not reflected in the parent array.

Okay, so, simple fix, right? Just put the ampersand on the function definition. &find(...). Easy.

But the very nature of this function is that sometimes it does not find an element that matches the predicate. So it necessarily has to return NULL sometimes. When combined with the return-by-reference function definition, this raises an E_NOTICE:

PHP Notice: Only variable references should be returned by reference in [stacktrace]

I can stopgap the notice by defining a null variable in the function and returning the reference to that, but that feels like a code smell. So does simply suppressing the notice. I'd prefer to not do either of those, unless I can be sufficiently convinced that it's the preferred idiom.

Evidently I am doing something incorrect, but I'm not sure what the solution should be. Am I missing a better pattern here?

EDIT: An example desired usage of this hypothetical function:

$main_array = [
    [ 'key' =>  10 ],
    [ /* stuff */ ],
    [ /* stuff */ ],
    // ...
];

// Raises E_NOTICE if null
$sub_array =& find(
    $main_array,
    fn($x) => $x['key'] === 10
);

if (!$sub_array) {
    // handle null case as appropriate
}

// Modify substructure of returned value
$sub_array['key'] = 20;

// Changed value should reflect in main array:
echo $main_array[0]['key'] === 20 ? 'True' : 'False';
// True

Solution

  • Evidently I am doing something incorrect

    I don't think that's evident at all: in PHP, references are links between variables; so if you want to use references, you have to use variables. As you've already found, that means that to return a null by reference, you need to set a variable to null first. In other words, you have to use a value assignment ($foo = null) first before you can use a reference assignment ($bar =& $foo).

    PHP passes references to objects by default

    It's important to note that this is not the case in the same sense as =& reference assignment. Consider this code:

    $arr = [new class{}];
    $foo = find($arr, fn($item) => true);
    $foo->someProp = 'hello';
    $foo = 42;
    var_dump($arr);
    

    The array $arr still contains the original object, but the object will now have 'hello' in its property $someProp. The variable is not a reference, but the object is a mutable structure.

    Change line 2 to $foo =& find(... and instead the array will contain the value 42, and the object will be destructed. The variable itself is a reference, another name for $arr[0], so $foo = 42; becomes equivalent to $arr[0] = 42;

    Am I missing a better pattern here?

    Arguably the "better" pattern is simply not to use references, which are generally quite confusing to work with. If you want mutable values, then use objects, rather than arrays or other types.

    Note that the reason JavaScript's behaviour differs is not that it uses anything similar to PHP's references by default, but because in JavaScript arrays are themselves mutable objects.

    Translating the example above to JavaScript:

    arr = [new Object];
    foo = arr.find((item) => true);
    foo.someProp = 'hello';
    foo = 42;
    console.log(arr);
    

    As with PHP, arr still contains the object, which has been mutated; foo was not treated as a direct reference to the array element.