phpreferenceclone

In PHP can someone explain cloning vs pointer reference?


To begin with, I understand programming and objects, but the following doesn't make much sense to me in PHP.

In PHP we use the & operator to retrieve a reference to a variable. I understand a reference as being a way to refer to the same 'thing' with a different variable. If I say for example

$b = 1;
$a =& $b;
$a = 3;
echo $b;

will output 3 because changes made to $a are the same as changes made to $b. Conversely:

$b = 1;
$a = $b;
$a = 3;
echo $b;

should output 1.

If this is the case, why is the clone keyword necessary? It seems to me that if I set

$obj_a = $obj_b then changes made to $obj_a should not affect $obj_b, conversely $obj_a =& $obj_b should be pointing to the same object so changes made to $obj_a affect $obj_b.

However it seems in PHP that certain operations on $obj_a DO affect $obj_b even if assigned without the reference operator ($obj_a = $obj_b). This caused a frustrating problem for me today while working with DateTime objects that I eventually fixed by doing basically:

$obj_a = clone $obj_b

But most of the php code I write doesn't seem to require explicit cloning like in this case and works just fine without it. What's going on here?


Solution

  • Basically, there are two ways variables work in PHP...

    For everything except objects:

    1. Assignment is by value (meaning a copy occurs if you do $a = $b.
    2. Reference can be achieved by doing $a = &$b (Note the reference operator operates upon the variable, not the assignment operator, since you can use it in other places)...
    3. Copies use a copy-on-write tehnique. So if you do $a = $b, there is no memory copy of the variable. But if you then do $a = 5;, the memory is copied then and overwritten.

    For objects:

    1. Assignment is by object reference. It's not really the same as normal variable by reference (I'll explain why later).
    2. Copy by value can be achieved by doing $a = clone $b.
    3. Reference can be achieved by doing $a = &$b, but beware that this has nothing to do with the object. You're binding the $a variable to the $b variable. It doesn't matter if it's an object or not.

    So, why is assignment for objects not really reference? What happens if you do:

    $a = new stdclass();
    $b = $a;
    $a = 4;
    

    What's $b? Well, it's stdclass... That's because it's not writing a reference to the variable, but to the object...

    $a = new stdclass();
    $a->foo = 'bar';
    $b = $a;
    $b->foo = 'baz';
    

    What's $a->foo? It's baz. That's because when you did $b = $a, you are telling PHP to use the same object instance (hence the object reference). Note that $a and $b are not the same variable, but they do both reference the same object.

    One way of thinking about it, is to think of all variables which store an object as storing the pointer to that object. So the object lives somewhere else. When you assign $a = $b where $b is an object, all you're doing is copying that pointer. The actual variables are still disjoint. But when you do $a = &$b, you're storing a pointer to $b inside of $a. Now, when you manipulate $a it cascades the pointer chain to the base object. When you use the clone operator, you're telling PHP to copy the existing object, and create a new one with the same state... So clone really just does a by-value copy of the varaible...

    So if you noticed, I said the object is not stored in an actual variable. It's stored somewhere else and nothing but a pointer is stored in the variable. So this means that you can have (and often do have) multiple variables pointing to the same instance. For this reason, the internal object representation contains a refcount (Simply a count of the number of variables pointing to it). When an object's refcount drops to 0 (meaning that all the variables pointing to it either go out of scope, or are changed to somethign else) it is garbaged collected (as it is no longer accessable)...

    You can read more on references and PHP in the docs...

    Disclaimer: Some of this may be oversimplification or blurring of certain concepts. I intended this only to be a guide to how they work, and not an exact breakdown of what goes on internally...

    Edit: Oh, and as for this being "clunky", I don't think it is. I think it is really useful. Otherwise you'd have variable references being passed around all over the place. And that can yield some really interesting bugs when a variable in one part of an application affects another variable in another part of the app. And not because it's passed, but because a reference was made somewhere along the line.

    In general, I don't use variable references that much. It's rare that I find an honest need for them. But I do use object references all the time. I use them so much, that I'm happy that they are the default. Otherwise I'd need to write some operator (since & denotes a variable reference, there'd need to be another to denote an object reference). And considering that I rarely use clone, I'd say that 99.9% of use cases should use object references (so make the operator be used for the lower frequency cases)...

    JMHO

    I've also created a video explaining these differences. Check it out on YouTube.