pythongenerator

Problems creating a generator factory in python


I'd like to create a generator factory, i.e. a generator that yields generators, in python using a "generator expression" (generator equivalent of list comprehension). Here's an example:

import itertools as it
gen_factory=((pow(b,a) for a in it.count(1)) for b in it.count(10,10))

In my mind this should give the following output:

((10,100,1000,...), (20,400,8000,...), (30,900,27000,...), ...)

However, the following shows that the internal generators are getting reset:

g0 = next(gen_factory)
next(g0) # 10
next(g0) # 100
g1 = next(gen_factory)
next(g1) # 20
next(g0) # 8000

So the result of the last statement is equal to pow(20,3) whereas I expected it to be pow(10,3). It seems that calling next(gen_factory) alters the b value in g0 (but not the internal state a). Ideally, previous generators shouldn't change as we split off new generators from the generator factory.

Interestingly, I can get correct behavior by converting these to lists, here's a finite example:

finite_gen_factory = ((pow(b,a) for a in (1,2,3)) for b in (10,20,30))
[list(x) for x in finite_gen_factory]

which gives [[10, 100, 1000], [20, 400, 8000], [30, 900, 27000]], but trying to maintain separate generators fails as before:

finite_gen_factory = ((pow(b,a) for a in (1,2,3)) for b in (10,20,30))
g0 = next(finite_gen_factory)
g1 = next(finite_gen_factory)
next(g0) # 20, should be 10.

The closest explanation, I think, is in this answer, but I'm not sure what the correct way of resolving my problem is. I thought of copying (cloning) the internal generators, but I'm not sure this is possible. Also it.tee probably doesn't work here. A workaround might be defining the inner generator as a class, but I really wanted a compact generator expression for this. Also, some stackoverflow answers recommended using functools.partial for this kind of thing but I can't see how I could use that here.


Solution

  • You can prevent the closure by capturing b with for b in [b] (Attempt This Online!):

    gen_factory=((pow(b,a) for b in [b] for a in it.count(1)) for b in it.count(10,10))
    

    As documented:

    the iterable expression in the leftmost for clause is immediately evaluated, so that an error produced by it will be emitted at the point where the generator expression is defined, rather than at the point where the first value is retrieved.

    So when you create one of the (inner) generators, the list [b] is created and given to the generator, and then the for b in puts the value in the generator's local variable b, which it then keeps using instead of the outer generator's b.

    Btw I'd put such nested generators on multiple lines for readability:

    gen_factory = (
        (pow(b,a) for b in [b] for a in it.count(1))
        for b in it.count(10,10)
    )
    

    You could also use a different name if you worry that using the same name is confusing:

    gen_factory = (
        (pow(my_b,a) for my_b in [b] for a in it.count(1))
        for b in it.count(10,10)
    )
    

    Btw in this case you could also use map / other tools instead of a generator, which don't have the issue in the first place. Some possibilities:

    gen_factory = (
        map(pow, it.repeat(b), it.count(1))
        for b in it.count(10,10)
    )
    
    gen_factory = (
        map(b.__pow__, it.count(1))
        for b in it.count(10,10)
    )
    
    import functools as ft
    
    gen_factory = (
        map(ft.partial(pow, b), it.count(1))
        for b in it.count(10,10)
    )
    
    import operator as op
    
    gen_factory = (
        it.accumulate(it.repeat(b), op.mul)
        for b in it.count(10,10)
    )