Python 是否能够在多核上运行?

问: 因为 python 使用了“ GIL”,所以 python 能够同时运行其独立的线程吗?


资讯:

读完 这个之后,我对 python 是否能够利用多核心的优势感到相当不确定。尽管 Python 做得很好,但是想到它会缺少如此强大的能力还是让人感觉很奇怪。所以感到不确定,我决定在这里问。如果我编写一个多线程的程序,它是否能够在多个核上同时执行?

148633 次浏览

Threads share a process and a process runs on a core, but you can use python's multiprocessing module to call your functions in separate processes and use other cores, or you can use the subprocess module, which can run your code and non-python code too.

CPython (the classic and prevalent implementation of Python) can't have more than one thread executing Python bytecode at the same time. This means compute-bound programs will only use one core. I/O operations and computing happening inside C extensions (such as numpy) can operate simultaneously.

Other implementation of Python (such as Jython or PyPy) may behave differently, I'm less clear on their details.

The usual recommendation is to use many processes rather than many threads.

The answer is "Yes, But..."

But cPython cannot when you are using regular threads for concurrency.

You can either use something like multiprocessing, celery or mpi4py to split the parallel work into another process;

Or you can use something like Jython or IronPython to use an alternative interpreter that doesn't have a GIL.

A softer solution is to use libraries that don't run afoul of the GIL for heavy CPU tasks, for instance numpy can do the heavy lifting while not retaining the GIL, so other python threads can proceed. You can also use the ctypes library in this way.

If you are not doing CPU bound work, you can ignore the GIL issue entirely (kind of) since python won't acquire the GIL while it's waiting for IO.

Python threads cannot take advantage of many cores. This is due to an internal implementation detail called the GIL (global interpreter lock) in the C implementation of python (cPython) which is almost certainly what you use.

The workaround is the multiprocessing module http://www.python.org/dev/peps/pep-0371/ which was developed for this purpose.

Documentation: http://docs.python.org/library/multiprocessing.html

(Or use a parallel language.)

example code taking all 4 cores on my ubuntu 14.04, python 2.7 64 bit.

import time
import threading




def t():
with open('/dev/urandom') as f:
for x in xrange(100):
f.read(4 * 65535)


if __name__ == '__main__':
start_time = time.time()
t()
t()
t()
t()
print "Sequential run time: %.2f seconds" % (time.time() - start_time)


start_time = time.time()
t1 = threading.Thread(target=t)
t2 = threading.Thread(target=t)
t3 = threading.Thread(target=t)
t4 = threading.Thread(target=t)
t1.start()
t2.start()
t3.start()
t4.start()
t1.join()
t2.join()
t3.join()
t4.join()
print "Parallel run time: %.2f seconds" % (time.time() - start_time)

result:

$ python 1.py
Sequential run time: 3.69 seconds
Parallel run time: 4.82 seconds

I converted the script to Python3 and ran it on my Raspberry Pi 3B+:

import time
import threading


def t():
with open('/dev/urandom', 'rb') as f:
for x in range(100):
f.read(4 * 65535)


if __name__ == '__main__':
start_time = time.time()
t()
t()
t()
t()
print("Sequential run time: %.2f seconds" % (time.time() - start_time))


start_time = time.time()
t1 = threading.Thread(target=t)
t2 = threading.Thread(target=t)
t3 = threading.Thread(target=t)
t4 = threading.Thread(target=t)
t1.start()
t2.start()
t3.start()
t4.start()
t1.join()
t2.join()
t3.join()
t4.join()
print("Parallel run time: %.2f seconds" % (time.time() - start_time))

python3 t.py

Sequential run time: 2.10 seconds
Parallel run time: 1.41 seconds

For me, running parallel was quicker.

As stated in prior answers - it depends on the answer to "cpu or i/o bound?",
but also to the answer to "threaded or multi-processing?":

Examples run on Raspberry Pi 3B 1.2GHz 4-core with Python3.7.3
--( With other processes running including htop )

  • For this test - multiprocessing and threading had similar results for i/o bound,
    but multi-processing was more efficient than threading for cpu-bound.

Using threads:

Typical Result:
. Starting 4000 cycles of io-bound threading
. Sequential run time: 39.15 seconds
. 4 threads Parallel run time: 18.19 seconds
. 2 threads Parallel - twice run time: 20.61 seconds

Typical Result:
. Starting 1000000 cycles of cpu-only threading
. Sequential run time: 9.39 seconds
. 4 threads Parallel run time: 10.19 seconds
. 2 threads Parallel twice - run time: 9.58 seconds

Using multiprocessing:

Typical Result:
. Starting 4000 cycles of io-bound processing
. Sequential - run time: 39.74 seconds
. 4 procs Parallel - run time: 17.68 seconds
. 2 procs Parallel twice - run time: 20.68 seconds

Typical Result:
. Starting 1000000 cycles of cpu-only processing
. Sequential run time: 9.24 seconds
. 4 procs Parallel - run time: 2.59 seconds
. 2 procs Parallel twice - run time: 4.76 seconds

compare_io_multiproc.py:
#!/usr/bin/env python3


# Compare single proc vs multiple procs execution for io bound operation


"""
Typical Result:
Starting 4000 cycles of io-bound processing
Sequential - run time: 39.74 seconds
4 procs Parallel - run time: 17.68 seconds
2 procs Parallel twice - run time: 20.68 seconds
"""
import time
import multiprocessing as mp


# one thousand
cycles = 1 * 1000


def t():
with open('/dev/urandom', 'rb') as f:
for x in range(cycles):
f.read(4 * 65535)


if __name__ == '__main__':
print("  Starting {} cycles of io-bound processing".format(cycles*4))
start_time = time.time()
t()
t()
t()
t()
print("  Sequential - run time: %.2f seconds" % (time.time() - start_time))


# four procs
start_time = time.time()
p1 = mp.Process(target=t)
p2 = mp.Process(target=t)
p3 = mp.Process(target=t)
p4 = mp.Process(target=t)
p1.start()
p2.start()
p3.start()
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()
print("  4 procs Parallel - run time: %.2f seconds" % (time.time() - start_time))


# two procs
start_time = time.time()
p1 = mp.Process(target=t)
p2 = mp.Process(target=t)
p1.start()
p2.start()
p1.join()
p2.join()
p3 = mp.Process(target=t)
p4 = mp.Process(target=t)
p3.start()
p4.start()
p3.join()
p4.join()
print("  2 procs Parallel twice - run time: %.2f seconds" % (time.time() - start_time))


compare_cpu_multiproc.py
#!/usr/bin/env python3


# Compare single proc vs multiple procs execution for cpu bound operation


"""
Typical Result:
Starting 1000000 cycles of cpu-only processing
Sequential run time: 9.24 seconds
4 procs Parallel - run time: 2.59 seconds
2 procs Parallel twice - run time: 4.76 seconds
"""
import time
import multiprocessing as mp


# one million
cycles = 1000 * 1000


def t():
for x in range(cycles):
fdivision = cycles / 2.0
fcomparison = (x > fdivision)
faddition = fdivision + 1.0
fsubtract = fdivision - 2.0
fmultiply = fdivision * 2.0


if __name__ == '__main__':
print("  Starting {} cycles of cpu-only processing".format(cycles))
start_time = time.time()
t()
t()
t()
t()
print("  Sequential run time: %.2f seconds" % (time.time() - start_time))


# four procs
start_time = time.time()
p1 = mp.Process(target=t)
p2 = mp.Process(target=t)
p3 = mp.Process(target=t)
p4 = mp.Process(target=t)
p1.start()
p2.start()
p3.start()
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()
print("  4 procs Parallel - run time: %.2f seconds" % (time.time() - start_time))


# two procs
start_time = time.time()
p1 = mp.Process(target=t)
p2 = mp.Process(target=t)
p1.start()
p2.start()
p1.join()
p2.join()
p3 = mp.Process(target=t)
p4 = mp.Process(target=t)
p3.start()
p4.start()
p3.join()
p4.join()
print("  2 procs Parallel twice - run time: %.2f seconds" % (time.time() - start_time))