多处理中的共享内存

我有三个大型列表。第一个包含位数组(模块位数组0.8.0) ,另外两个包含整数数组。

l1=[bitarray 1, bitarray 2, ... ,bitarray n]
l2=[array 1, array 2, ... , array n]
l3=[array 1, array 2, ... , array n]

这些数据结构占用了相当多的 RAM (总共约16GB)。

如果我使用以下方法启动12个子进程:

multiprocessing.Process(target=someFunction, args=(l1,l2,l3))

这是否意味着将为每个子进程复制 l1、 l2和 l3,或者子进程将共享这些列表?或者更直接地说,我将使用16GB 还是192GB 的内存?

有些函数将从这些列表中读取一些值,然后根据读取的值执行一些计算。结果将返回到父进程。列表 l1、 l2和 l3不会被 some 函数修改。

因此,我会假设子进程不需要,也不会复制这些巨大的列表,而只是与父进程共享它们。也就是说,由于 Linux 采用写上复制的方法,程序将占用16GB 的 RAM (不管我启动多少个子进程) ? 我说的对吗,还是我遗漏了什么,导致名单被复制?

编辑: 在读了一些关于这个主题的书之后,我仍然感到困惑。一方面,Linux 使用写复制,这意味着不会复制任何数据。另一方面,访问对象将改变其引用计数(我仍然不确定为什么和什么意思)。即便如此,整个对象会被复制吗?

例如,如果我像下面这样定义一些函数:

def someFunction(list1, list2, list3):
i=random.randint(0,99999)
print list1[i], list2[i], list3[i]

使用这个函数是否意味着将为每个子进程完全复制 l1、 l2和 l3?

有办法检查这个吗?

EDIT2 在读取更多信息并监视子进程运行时系统的总内存使用情况之后,似乎确实为每个子进程复制了整个对象。这似乎是因为引用计数。

在我的程序中,l1、 l2和 l3的引用计数实际上是不需要的。这是因为在父进程退出之前,l1、 l2和 l3将保存在内存中(不变)。在此之前,不需要释放这些列表使用的内存。事实上,我确信引用计数将保持在0以上(对于这些列表和这些列表中的每个对象) ,直到程序退出。

所以现在的问题是,我怎样才能确保对象不会被复制到每个子进程?我是否可以禁用这些列表和这些列表中的每个对象的引用计数?

只是一个附加说明。子进程不需要修改 l1l2l3或这些列表中的任何对象。子进程只需要能够引用其中的一些对象,而不会导致为每个子进程复制内存。

141900 次浏览

Generally speaking, there are two ways to share the same data:

  • Multithreading
  • Shared memory

Python's multithreading is not suitable for CPU-bound tasks (because of the GIL), so the usual solution in that case is to go on multiprocessing. However, with this solution you need to explicitly share the data, using multiprocessing.Value and multiprocessing.Array.

Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues; an approach involving actors exchanging messages is usually seen as a better choice. See also Python documentation:

As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes.

However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.

In your case, you need to wrap l1, l2 and l3 in some way understandable by multiprocessing (e.g. by using a multiprocessing.Array), and then pass them as parameters.
Note also that, as you said you do not need write access, then you should pass lock=False while creating the objects, or all access will be still serialized.

If you want to make use of copy-on-write feature and your data is static(unchanged in child processes) - you should make python don't mess with memory blocks where your data lies. You can easily do this by using C or C++ structures (stl for instance) as containers and provide your own python wrappers that will use pointers to data memory (or possibly copy data mem) when python-level object will be created if any at all. All this can be done very easy with almost python simplicity and syntax with cython.

# pseudo cython
cdef class FooContainer:
cdef char * data
def __cinit__(self, char * foo_value):
self.data = malloc(1024, sizeof(char))
memcpy(self.data, foo_value, min(1024, len(foo_value)))
   

def get(self):
return self.data


# python part
from foo import FooContainer


f = FooContainer("hello world")
pid = fork()
if not pid:
f.get() # this call will read same memory page to where
# parent process wrote 1024 chars of self.data
# and cython will automatically create a new python string
# object from it and return to caller

The above pseudo-code is badly written. Dont use it. In place of self.data should be C or C++ container in your case.

You can use memcached or redis and set each as a key value pair {'l1'...

Because this is still a very high result on google and no one else has mentioned it yet, I thought I would mention the new possibility of 'true' shared memory which was introduced in python version 3.8.0: https://docs.python.org/3/library/multiprocessing.shared_memory.html

I have here included a small contrived example (tested on linux) where numpy arrays are used, which is likely a very common use case:

# one dimension of the 2d array which is shared
dim = 5000


import numpy as np
from multiprocessing import shared_memory, Process, Lock
from multiprocessing import cpu_count, current_process
import time


lock = Lock()


def add_one(shr_name):


existing_shm = shared_memory.SharedMemory(name=shr_name)
np_array = np.ndarray((dim, dim,), dtype=np.int64, buffer=existing_shm.buf)
lock.acquire()
np_array[:] = np_array[0] + 1
lock.release()
time.sleep(10) # pause, to see the memory usage in top
print('added one')
existing_shm.close()


def create_shared_block():


a = np.ones(shape=(dim, dim), dtype=np.int64)  # Start with an existing NumPy array


shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
# # Now create a NumPy array backed by shared memory
np_array = np.ndarray(a.shape, dtype=np.int64, buffer=shm.buf)
np_array[:] = a[:]  # Copy the original data into shared memory
return shm, np_array


if current_process().name == "MainProcess":
print("creating shared block")
shr, np_array = create_shared_block()


processes = []
for i in range(cpu_count()):
_process = Process(target=add_one, args=(shr.name,))
processes.append(_process)
_process.start()


for _process in processes:
_process.join()


print("Final array")
print(np_array[:10])
print(np_array[10:])


shr.close()
shr.unlink()

Note that because of the 64 bit ints this code can take about 1gb of ram to run, so make sure that you won't freeze your system using it. ^_^

For those interested in using Python3.8 's shared_memory module, it still has a bug (github issue link here) which hasn't been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15). The bug affects posix systems and is about resource tracker destroys shared memory segments when other processes should still have valid access. So take care if you use it in your code.