When reading Antony Hatchkins' answer to "How to override the copy/deepcopy operations for a Python object?", I am confused about why his implementation of __deepcopy()__
does not check memo
first for whether the current object is already copied before copying the current object. This is also pointed out in the comment by Antonín Hoskovec. Jonathan H's comment also addressed this issue and mentioned that copy.deepcopy()
appears to abort the call to __deepcopy()__
if an object has already been copied before. However, he does not point out clearly where this is done in the code of copy
module.
To illustrate the issue with not checking memo
, suppose object a
references b
and c
, and both objects b
and c
references object d
. During a deepcopy of a
, object d
should be only copied once during the copy of b
or c
, whichever comes first.
Essentially, I am asking the rationale for why Antony Hatchkins' answer does not do the following:
from copy import deepcopy
class A:
def __deepcopy__(self, memo):
# Why not add the following two lines?
if id(self) in memo:
return memo[id(self)]
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
setattr(result, k, deepcopy(v, memo))
return result
Therefore, it would be great if someone can explain the internal implementation of deepcopy()
in the copy
module both to demonstrate the best practice for overriding __deepcopy__
and also just to let me know what is happening under the hood.
I took a brief look at the source code for copy.deepcopy()
but was confused by things like copier
, reductor
, and _reconstruct()
. I read answers like deepcopy override clarification and In Python, how can I call copy.deepcopy in my implementation of deepcopy()? but none of them gave a comprehensive answer and rationale.
The (reference) implementation for copy.deepcopy
is here
As you can see, the firsts thing that function does is check for the instance in the memo
, so no need to check in your own implementation.
Here is a breakdown of how that function works:
deepcopy(x, memo=None)
checks if x
is in the memo
. If it is, return the value associated to it.
tries to work out the copying method, by, in that order
_deepcopy_dispatch
dictionaryx
has a __deepcopy__
method, and using thatruns the found method to create a copy
registers that copy in the memo.
(I am ellipsing over some details, read the code if you are interesting in them)
So to answer your questions (and others you may have):
__deepcopy__
_deepcopy_dispatch
dictionary, but that dictionary should only contain methods for basic types)__deepcopy__
function is called. This one should recursively call deepcopy
with the same memo dictionarydeepcopy
function also does it (step 4)deepcopy
registers the object in memo
at the very end, whereas to avoid infinite recursion, you need to register it before doing recursive callsNote:
For a simpler way to allow your custom classes to be copied, you can also implement the __gestate__
and __setstate__
methods, and relying on the fact that deepcopy falls back on pickling methods