池: AttributeError

我在一个类中有一个方法,它需要在一个循环中执行大量的工作,我希望将工作分散到所有的核中。

我编写了下面的代码,如果使用正常的 map(),这个代码可以工作,但是使用 pool.map()会返回一个错误。

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)


class OtherClass:
def run(sentence, graph):
return False


class SomeClass:
def __init__(self):
self.sentences = [["Some string"]]
self.graphs = ["string"]


def some_method(self):
other = OtherClass()


def single(params):
sentences, graph = params
return [other.run(sentence, graph) for sentence in sentences]


return list(pool.map(single, zip(self.sentences, self.graphs)))




SomeClass().some_method()

错误1:

AttributeError: 无法 pickle 本地对象‘ Some Class.some _ method. . single’

为什么不能腌制 single()?我甚至尝试将 single()移动到全局模块作用域(不在类中——使其独立于上下文) :

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)


class OtherClass:
def run(sentence, graph):
return False




def single(params):
other = OtherClass()
sentences, graph = params
return [other.run(sentence, graph) for sentence in sentences]


class SomeClass:
def __init__(self):
self.sentences = [["Some string"]]
self.graphs = ["string"]


def some_method(self):
return list(pool.map(single, zip(self.sentences, self.graphs)))




SomeClass().some_method()

然后我得到了下面这些..。

错误2:

AttributeError: 无法从’.../test.py’获取模块‘ 总台’上的‘ single’属性

107535 次浏览

Error 1:

AttributeError: Can't pickle local object 'SomeClass.some_method..single'

You solved this error yourself by moving the nested target-function single() out to the top-level.

Background:

Pool needs to pickle (serialize) everything it sends to its worker-processes (IPC). Pickling actually only saves the name of a function and unpickling requires re-importing the function by name. For that to work, the function needs to be defined at the top-level, nested functions won't be importable by the child and already trying to pickle them raises an exception (more).


Error 2:

AttributeError: Can't get attribute 'single' on module 'main' from '.../test.py'

You are starting the pool before you define your function and classes, that way the child processes cannot inherit any code. Move your pool start up to the bottom and protect (why?) it with if __name__ == '__main__':

import multiprocessing


class OtherClass:
def run(self, sentence, graph):
return False




def single(params):
other = OtherClass()
sentences, graph = params
return [other.run(sentence, graph) for sentence in sentences]


class SomeClass:
def __init__(self):
self.sentences = [["Some string"]]
self.graphs = ["string"]


def some_method(self):
return list(pool.map(single, zip(self.sentences, self.graphs)))


if __name__ == '__main__':  # <- prevent RuntimeError for 'spawn'
# and 'forkserver' start_methods
with multiprocessing.Pool(multiprocessing.cpu_count() - 1) as pool:
print(SomeClass().some_method())

Appendix

...I would like to spread the work over all of my cores.

Potentially helpful background on how multiprocessing.Pool is chunking work:

Python multiprocessing: understanding logic behind chunksize

I accidentally discovered a very nasty solution. It works, as long as you use a def statement. If you declare the function, that you want to use in Pool.map with the global keyword at the beginning of the function that solves it. But I would not rely on this in serious applications 😉

import multiprocessing
pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)


class OtherClass:
def run(sentence, graph):
return False


class SomeClass:
def __init__(self):
self.sentences = [["Some string"]]
self.graphs = ["string"]


def some_method(self):
global single  # This is ugly, but does the trick XD


other = OtherClass()


def single(params):
sentences, graph = params
return [other.run(sentence, graph) for sentence in sentences]


return list(pool.map(single, zip(self.sentences, self.graphs)))




SomeClass().some_method()