在Python中重置生成器对象

我有一个由多重yield返回的生成器对象。准备调用这个生成器是相当耗时的操作。这就是为什么我想重复使用发电机几次。

y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)

当然,我正在考虑将内容复制到简单的列表中。有办法重置我的发电机吗?

111662 次浏览

如果GrzegorzOledzki的答案还不够,你可以使用send()来实现你的目标。有关增强生成器和yield表达式的更多详细信息,请参见pep - 0342

更新:请参见itertools.tee()。它涉及到上面提到的一些内存与处理的权衡,但它比仅将生成器结果存储在list;这取决于你如何使用发电机。

发电机不能倒带。您有以下选项:

  1. 再次运行generator函数,重新启动生成:

    y = FunctionWithYield()
    for x in y: print(x)
    y = FunctionWithYield()
    for x in y: print(x)
    
  2. Store the generator results in a data structure on memory or disk which you can iterate over again:

    y = list(FunctionWithYield())
    for x in y: print(x)
    # can iterate again:
    for x in y: print(x)
    

The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.

So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.

可能最简单的解决方案是将昂贵的部分包装在一个对象中,并将其传递给生成器:

data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass

这样,就可以缓存昂贵的计算。

如果可以同时将所有结果保存在RAM中,则使用list()将生成器的结果物化为普通列表并使用该列表。

另一种选择是使用itertools.tee()函数创建生成器的第二个版本:

import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)

从内存使用的角度来看,如果原始迭代可能不处理所有的项,这可能是有益的。

我不知道你说的昂贵的准备是什么意思,但我猜你确实有

data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)

如果是这样的话,为什么不重用data呢?

它可以通过code对象来实现。下面是一个例子。

code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i
< p > 1 2 3. 4 < / p >
for i in y: print i




exec(code1)
for i in y: print i
< p > 1 2 3. 4 < / p >
>>> def gen():
...     def init():
...         return 0
...     i = init()
...     while True:
...         val = (yield i)
...         if val=='restart':
...             i = init()
...         else:
...             i += 1


>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

您可以定义一个返回生成器的函数

def f():
def FunctionWithYield(generator_args):
code here...


return FunctionWithYield

现在你可以想做多少次就做多少次:

for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

tee官方文件:

通常,如果一个迭代器使用了前面的大部分或所有数据 当另一个迭代器启动时,使用list()比tee()更快

所以在你的例子中最好使用list(iterable)

我想为一个老问题提供一个不同的解决方案

class IterableAdapter:
def __init__(self, iterator_factory):
self.iterator_factory = iterator_factory


def __iter__(self):
return self.iterator_factory()


squares = IterableAdapter(lambda: (x * x for x in range(5)))


for x in squares: print(x)
for x in squares: print(x)

list(iterator)之类的东西相比,这样做的好处是,这是O(1)空间复杂度,而list(iterator)O(n)。缺点是,如果你只能访问迭代器,而不能访问产生迭代器的函数,那么你就不能使用这个方法。例如,这样做似乎是合理的,但它不会起作用。

g = (x * x for x in range(5))


squares = IterableAdapter(lambda: g)


for x in squares: print(x)
for x in squares: print(x)

好吧,你说你想多次调用一个生成器,但初始化是昂贵的…像这样的东西怎么样?

class InitializedFunctionWithYield(object):
def __init__(self):
# do expensive initialization
self.start = 5


def __call__(self, *args, **kwargs):
# do cheap iteration
for i in xrange(5):
yield self.start + i


y = InitializedFunctionWithYield()


for x in y():
print x


for x in y():
print x

或者,你也可以创建自己的类,遵循迭代器协议,并定义某种“reset”函数。

class MyIterator(object):
def __init__(self):
self.reset()


def reset(self):
self.i = 5


def __iter__(self):
return self


def next(self):
i = self.i
if i > 0:
self.i -= 1
return i
else:
raise StopIteration()


my_iterator = MyIterator()


for x in my_iterator:
print x


print 'resetting...'
my_iterator.reset()


for x in my_iterator:
print x

https://docs.python.org/2/library/stdtypes.html#iterator-types http://anandology.com/python-practice-book/iterators.html < / p >

如果你的生成器在某种意义上是纯的,它的输出只依赖于传递的参数和步长,并且你希望生成的生成器是可重新启动的,这里有一个排序代码片段可能很方便:

import copy


def generator(i):
yield from range(i)


g = generator(10)
print(list(g))
print(list(g))


class GeneratorRestartHandler(object):
def __init__(self, gen_func, argv, kwargv):
self.gen_func = gen_func
self.argv = copy.copy(argv)
self.kwargv = copy.copy(kwargv)
self.local_copy = iter(self)


def __iter__(self):
return self.gen_func(*self.argv, **self.kwargv)


def __next__(self):
return next(self.local_copy)


def restartable(g_func: callable) -> callable:
def tmp(*argv, **kwargv):
return GeneratorRestartHandler(g_func, argv, kwargv)


return tmp


@restartable
def generator2(i):
yield from range(i)


g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))

输出:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

没有重置迭代器的选项。迭代器通常在迭代next()函数时弹出。唯一的方法是在迭代迭代器对象之前进行备份。下面的检查。

创建包含0到9项的迭代器对象

i=iter(range(10))

遍历将弹出的next()函数

print(next(i))

将迭代器对象转换为list

L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

所以第0项已经跳出来了。此外,当我们将迭代器转换为list时,所有的项都会弹出。

next(L)


Traceback (most recent call last):
File "<pyshell#129>", line 1, in <module>
next(L)
StopIteration
因此,在开始迭代之前,您需要将迭代器转换为列表以备备份。 List可以转换为iter(<list-object>)

的迭代器

你现在可以使用more_itertools.seekable(第三方工具)来重置迭代器。

通过> pip install more_itertools安装

import more_itertools as mit




y = mit.seekable(FunctionWithYield())
for x in y:
print(x)


y.seek(0)                                              # reset iterator
for x in y:
print(x)

注意:内存消耗会随着迭代器的增加而增加,所以要警惕大型迭代对象。

使用包装器函数来处理StopIteration

您可以为生成器生成函数编写一个简单的包装器函数,用于跟踪生成器耗尽的时间。它将使用生成器在迭代结束时抛出的StopIteration异常来实现这一点。

import types


def generator_wrapper(function=None, **kwargs):
assert function is not None, "Please supply a function"
def inner_func(function=function, **kwargs):
generator = function(**kwargs)
assert isinstance(generator, types.GeneratorType), "Invalid function"
try:
yield next(generator)
except StopIteration:
generator = function(**kwargs)
yield next(generator)
return inner_func

如上所述,当包装器函数捕获StopIteration异常时,它只是重新初始化生成器对象(使用函数调用的另一个实例)。

然后,假设你定义了如下所示的生成器提供函数,你可以使用Python函数装饰器语法来隐式包装它:

@generator_wrapper
def generator_generating_function(**kwargs):
for item in ["a value", "another value"]
yield item

你可以使用itertools.cycle() 您可以使用此方法创建一个迭代器,然后在迭代器上执行for循环,该迭代器将遍历其值

例如:

def generator():
for j in cycle([i for i in range(5)]):
yield j


gen = generator()
for i in range(20):
print(next(gen))

将生成20个数字,0到4重复。

医生说:

Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

我的答案解决了稍微不同的问题:如果初始化生成器的开销很大,生成每个生成的对象的开销也很大。但是我们需要在多个函数中多次使用生成器。为了只调用一次生成器和每个生成的对象,我们可以使用线程并在不同的线程中运行每个消费方法。由于GIL,我们可能无法实现真正的并行,但我们将实现我们的目标。

这种方法在以下情况下做得很好:深度学习模型处理了大量图像。结果是图像上的很多物体都有很多遮罩。每个掩码都会消耗内存。我们有大约10种方法来进行不同的统计和度量,但它们都是一次性拍摄所有图像。所有的图像都装不下内存。方法可以很容易地重写为接受迭代器。

class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''


def __init__(self, gen):
self.gen = gen
self.consumers: List[GeneratorSplitter.InnerGen] = []
self.thread: threading.Thread = None
self.value = None
self.finished = False
self.exception = None


def GetConsumer(self):
# Returns a generator object.
cons = self.InnerGen(self)
self.consumers.append(cons)
return cons


def _Work(self):
try:
for d in self.gen:
for cons in self.consumers:
cons.consumed.wait()
cons.consumed.clear()


self.value = d


for cons in self.consumers:
cons.readyToRead.set()


for cons in self.consumers:
cons.consumed.wait()


self.finished = True


for cons in self.consumers:
cons.readyToRead.set()
except Exception as ex:
self.exception = ex
for cons in self.consumers:
cons.readyToRead.set()


def Start(self):
self.thread = threading.Thread(target=self._Work)
self.thread.start()


class InnerGen:
def __init__(self, parent: "GeneratorSplitter"):
self.parent: "GeneratorSplitter" = parent
self.readyToRead: threading.Event = threading.Event()
self.consumed: threading.Event = threading.Event()
self.consumed.set()


def __iter__(self):
return self


def __next__(self):
self.readyToRead.wait()
self.readyToRead.clear()
if self.parent.finished:
raise StopIteration()
if self.parent.exception:
raise self.parent.exception
val = self.parent.value
self.consumed.set()
return val

Ussage:

genSplitter = GeneratorSplitter(expensiveGenerator)


metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()


metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

这对我来说是工作。

csv_rows = my_generator()
for _ in range(10):
for row in csv_rows:
print(row)
csv_rows = my_generator()

如果你想多次重用这个生成器,你可以使用functools.partial

from functools import partial
func_with_yield = partial(FunctionWithYield)


for i in range(100):
for x in func_with_yield():
print(x)

这将把生成器函数包装在另一个函数中,因此每次调用func_with_yield()时,它都会创建相同的生成器函数。

注意:如果你有参数,它也接受FunctionWithYield(args)的函数参数。