多处理会导致 Python 崩溃,并且当调用 fork()时,另一个线程可能正在发生错误

我对 Python 比较陌生,正在尝试为 for 循环实现 MultiProcessing 模块。

我在 img _ urls 中存储了一个 Image url 的数组,我需要下载并应用一些 Google Vision。

if __name__ == '__main__':


img_urls = [ALL_MY_Image_URLS]
runAll(img_urls)
print("--- %s seconds ---" % (time.time() - start_time))

这是我的 runAll ()方法

def runAll(img_urls):
num_cores = multiprocessing.cpu_count()


print("Image URLS  {}",len(img_urls))
if len(img_urls) > 2:
numberOfImages = 0
else:
numberOfImages = 1


start_timeProcess = time.time()


pool = multiprocessing.Pool()
pool.map(annotate,img_urls)
end_timeProcess = time.time()
print('\n Time to complete ', end_timeProcess-start_timeProcess)


print(full_matching_pages)




def annotate(img_path):
file =  requests.get(img_path).content
print("file is",file)
"""Returns web annotations given the path to an image."""
print('Process Working under ',os.getpid())
image = types.Image(content=file)
web_detection = vision_client.web_detection(image=image).web_detection
report(web_detection)

当我运行它和 python 崩溃时,我得到这个作为警告

objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67570]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67567]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67568]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67569]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67571]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[67572]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
60434 次浏览

发生此错误是因为增加了安全性以限制 macOS High Sierra 和更高版本的 macOS 中的多线程。我知道这个答案有点晚,但是我用以下方法解决了这个问题:

设置一个环境变量 .bash_profile(或最近的 macOS 的 .zshrc) ,以允许新的 macOS High Sierra 安全规则下的多线程应用程序或脚本。

打开终端机:

$ nano .bash_profile

在文件末尾添加以下行:

OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

保存、退出、关闭终端机并重新打开终端机,检查是否已设置环境变量:

$ env

您将看到的输出类似于:

TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
TERM=xterm-256color
TMPDIR=/var/folders/pn/vasdlj3ojO#OOas4dasdffJq/T/
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.E7qLFJDSo/Render
TERM_PROGRAM_VERSION=404
TERM_SESSION_ID=NONE
OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

您现在应该能够使用多线程运行 Python 脚本了。

在没有 OBJC_DISABLE_INITIALIZE_FORK_SAFETY标志的环境中工作的解决方案包括在 main()程序启动后立即初始化 multiprocessing.Pool类。

这很可能不是最快的解决方案,我不确定它是否适用于所有情况,然而,在我的程序启动之前足够早地预热工作进程不会导致任何 ... may have been in progress in another thread when fork() was called错误,并且与非并行代码相比,我确实获得了显著的性能提升。

我已经创建了一个方便的类 Parallelizer,我很早就开始使用,然后在整个生命周期使用我的程序。完整的版本可以找到 给你

# entry point to my program
def main():
parallelizer = Parallelizer()
...

当你想要并行化的时候:

# this function is parallelized. it is run by each child process.
def processing_function(input):
...
return output


...
inputs = [...]
results = parallelizer.map(
inputs,
processing_function
)

并行化类:

class Parallelizer:
def __init__(self):
self.input_queue = multiprocessing.Queue()
self.output_queue = multiprocessing.Queue()
self.pool = multiprocessing.Pool(multiprocessing.cpu_count(),
Parallelizer._run,
(self.input_queue, self.output_queue,))


def map(self, contents, processing_func):
size = 0
for content in contents:
self.input_queue.put((content, processing_func))
size += 1
results = []
while size > 0:
result = self.output_queue.get(block=True)
results.append(result)
size -= 1
return results


@staticmethod
def _run(input_queue, output_queue):
while True:
content, processing_func = input_queue.get(block=True)
result = processing_func(content)
output_queue.put(result)

一个警告: 并行代码可能难以调试,所以我还准备了一个非并行版本的类,当子进程出现问题时,我就启用它:

class NullParallelizer:
@staticmethod
def map(contents, processing_func):
results = []
for content in contents:
results.append(processing_func(content))
return results

其他答案是告诉你设置 OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES,但不要这样做!你只是在警示灯上贴了胶带。您可能需要在一些遗留软件的个案基础上,但当然不会在您的 .bash_profile设置这一点!

这在 https://bugs.python.org/issue33725(python3.8 +)中是固定的,但是使用它是最佳实践

with multiprocessing.get_context("spawn").Pool() as pool:
pool.map(annotate,img_urls)

运行 MAC 和 z-shell,在我的. zhrc-file 中我必须添加:

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

然后在命令行中:

source ~/.zshrc

那就成功了

OBJC_DISABLE_INITIALIZE_FORK_SAFETY = YES解决方案对我不起作用。另一个可能的解决方案是在脚本环境 正如这里所描述的中设置 no_proxy = *

除了其他原因外,此错误消息还可能与网络有关。我的脚本有一个 tcp 服务器。我甚至不使用池,只使用 os.forkmultiprocessing.Queue进行消息传递。在我添加队列之前,叉子一直工作得很好。

在我的例子中,设置 no _ xy 本身就修复了这个问题。如果您的脚本具有网络组件,请尝试此修复程序-也许与 OBJC_DISABLE_INITIALIZE_FORK_SAFETY结合使用。