Why do list comprehensions write to the loop variable, but generators don't?

If I do something with list comprehensions, it writes to a local variable:

i = 0
test = any([i == 2 for i in xrange(10)])
print i

This prints "9". However, if I use a generator, it doesn't write to a local variable:

i = 0
test = any(i == 2 for i in xrange(10))
print i

This prints "0".

Is there any good reason for this difference? Is this a design decision, or just a random byproduct of the way that generators and list comprehensions are implemented? Personally, it would seem better to me if list comprehensions didn't write to local variables.

2709 次浏览

Personally, it would seem better to me if list comprehensions didn't write to local variables.

You are correct. This is fixed in Python 3.x. The behavior is unchanged in 2.x so that it doesn't impact existing code that (ab)uses this hole.

Because... because.

No, really, that's it. Quirk of the implementation. And arguably a bug, since it's fixed in Python 3.

As PEP 289 (Generator Expressions) explains:

The loop variable (if it is a simple variable or a tuple of simple variables) is not exposed to the surrounding function. This facilitates the implementation and makes typical use cases more reliable.

It appears to have been done for implementation reasons.

Personally, it would seem better to me if list comprehensions didn't write to local variables.

PEP 289 clarifies this as well:

List comprehensions also "leak" their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list().

In other words, the behaviour you describe indeed differs in Python 2 but it has been fixed in Python 3.

Python’s creator, Guido van Rossum, mentions this when he wrote about generator expressions that were uniformly built into Python 3: (emphasis mine)

We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension "leaks" the loop control variable into the surrounding scope:

x = 'before'
a = [x for x in 1, 2, 3]
print x # this prints '3', not 'before'

This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.

However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.

So in Python 3 you won’t see this happen anymore.

Interestingly, dict comprehensions in Python 2 don’t do this either; this is mostly because dict comprehensions were backported from Python 3 and as such already had that fix in them.

There are some other questions that cover this topic too, but I’m sure you have already seen those when you searched for the topic, right? ;)

One of the subtle consequences of the dirty secret described by poke above, is that list(...) and [...] does not have the same side-effects in Python 2:

In [1]: a = 'Before'
In [2]: list(a for a in range(5))
In [3]: a
Out[3]: 'Before'

So no side-effect for generator expression inside list-constructor, but the side-effect is there in a direct list-comprehension:

In [4]: [a for a in range(5)]
In [5]: a
Out[5]: 4

As a by-product of wandering how list-comprehensions are actually implemented, I found out a good answer for your question.

In Python 2, take a look at the byte-code generated for a simple list comprehension:

>>> s = compile('[i for i in [1, 2, 3]]', '', 'exec')
>>> dis(s)
1           0 BUILD_LIST               0
3 LOAD_CONST               0 (1)
6 LOAD_CONST               1 (2)
9 LOAD_CONST               2 (3)
12 BUILD_LIST               3
15 GET_ITER
>>   16 FOR_ITER                12 (to 31)
19 STORE_NAME               0 (i)
22 LOAD_NAME                0 (i)
25 LIST_APPEND              2
28 JUMP_ABSOLUTE           16
>>   31 POP_TOP
32 LOAD_CONST               3 (None)
35 RETURN_VALUE

it essentially translates to a simple for-loop, that's the syntactic sugar for it. As a result, the same semantics as for for-loops apply:

a = []
for i in [1, 2, 3]
a.append(i)
print(i) # 3 leaky

In the list-comprehension case, (C)Python uses a "hidden list name" and a special instruction LIST_APPEND to handle creation but really does nothing more than that.

So your question should generalize to why Python writes to the for loop variable in for-loops; that is nicely answered by a blog post from Eli Bendersky.

Python 3, as mentioned and by others, has changed the list-comprehension semantics to better match that of generators (by creating a separate code-object for the comprehension) and is essentially syntactic sugar for the following:

a = [i for i in [1, 2, 3]]


# equivalent to
def __f(it):
_ = []
for i in it
_.append(i)
return _
a = __f([1, 2, 3])

this won't leak because it doesn't run in the uppermost scope as the Python 2 equivalent does. The i is leaked, only in __f and then destroyed as a local variable to that function.

If you'd want, take a look at the byte-code generated for Python 3 by running dis('a = [i for i in [1, 2, 3]]'). You'll see how a "hidden" code-object is loaded and then a function call is made in the end.