Consider the code below:
a = [1, 2]
a.extend(x*2 for x in a)
print(a)
At first glance, it would seem (to the uninitiated) that the code will generate the following output:
[1, 2, 2, 4]
But it does NOT generate that output. In fact, the program never ends - it is an infinite program. You need to convert that generator expression to a list first. Like so:
a = [1, 2]
a.extend([x*2 for x in a])
print(a)
Question is, why is it an infinite program when using the first approach? I understand that in order to execute extend
, it must first evaluate the generator expression, which in turn references the original list. But here's how I would assume it works (or should work):
extend
has a generator expression as the argument, so it goes ahead and evaluates the expression first[2, 4]
extend
the original list a
with [2, 4]
But looks like that's not what actually happens.
Can anyone explain what is going on, and describe why it behaves the way it does?
- generator expression is evaluated to [2, 4]
Is wrong. The generator expression is evaluated lazily, an item at a time, no such list
is produced, and the items are appended one at a time, in a way that means the generator expression eventually starts reading its own output from later in the list
as it goes.
Generator expressions are lazy. list.extend
doesn't need to completely evaluate the argument, it can just iterate it as it goes, appending each item as it goes. The internal implementation can be written entirely in terms of append
, e.g.:
def extend(self, it):
for x in it:
self.append(x)
In practice, it's a C version of the loop with some minor optimizations tossed in to reduce reallocations when possible and some optimizations/protections that special-case list
, tuple
and direct self-extend
s.
This is typically a good thing; the whole point of generator expressions is to avoid a wasteful temporary list
, and forcibly converting from genexpr to list
would make it slower and add additional unnecessary memory expenses. If it just converted to list
then extended every time an operation like this occurred, genexprs would be mostly pointless. So they don't.
The code does include protection against the simple case of:
mylist.extend(mylist)
but it doesn't attempt to defend against cases like this where there's indirection in the way. There's no way to defend against it in general (it can't trace through all the layers of indirection possible to determine it's really self-assignment); when they come up, you're responsible for doing it safely.