为类型化内存视图分配内存的推荐方法是什么?

关于类型化内存视图的 Cython 文档列出了三种赋值类型化内存视图的方法:

  1. 从一个原始的 C 指针,
  2. np.ndarray
  3. 来自 cython.view.array

假设我没有从外部传入到 cython 函数中的数据,而是想分配内存并将其作为 np.ndarray返回,那么我应该选择哪个选项?还要假设该缓冲区的大小不是编译时常量,也就是说,我不能在堆栈上分配,但是在选项1中需要 malloc

因此,这3个选项大致如下:

from libc.stdlib cimport malloc, free
cimport numpy as np
from cython cimport view


np.import_array()


def memview_malloc(int N):
cdef int * m = <int *>malloc(N * sizeof(int))
cdef int[::1] b = <int[:N]>m
free(<void *>m)


def memview_ndarray(int N):
cdef int[::1] b = np.empty(N, dtype=np.int32)


def memview_cyarray(int N):
cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")

令我惊讶的是,在这三种情况下,Cython 生成了相当多的代码用于内存分配,尤其是对 __Pyx_PyObject_to_MemoryviewSlice_dc_int的调用。这表明(这里我可能错了,我对 Cython 内部工作原理的了解非常有限)它首先创建一个 Python 对象,然后将其“强制转换”到内存视图中,这似乎是不必要的开销。

简单的基准并没有显示出这三种方法之间有多大的差别,其中2种方法是最快的,但差距很小。

这三种方法中哪一种是推荐的? 还是有一种不同的、更好的选择?

后续问题: 在函数中处理了内存视图之后,我想最终以 np.ndarray的形式返回结果。类型化内存视图是最好的选择,还是我宁愿首先使用以下旧的缓冲区接口来创建 ndarray

cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)
18848 次浏览

Look here for an answer.

The basic idea is that you want cpython.array.array and cpython.array.clone (not cython.array.*):

from cpython.array cimport array, clone


# This type is what you want and can be cast to things of
# the "double[:]" syntax, so no problems there
cdef array[double] armv, templatemv


templatemv = array('d')


# This is fast
armv = clone(templatemv, L, False)

EDIT

It turns out that the benchmarks in that thread were rubbish. Here's my set, with my timings:

# cython: language_level=3
# cython: boundscheck=False
# cython: wraparound=False


import time
import sys


from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
import numpy as numpy
cimport numpy as numpy


cdef int loops


def timefunc(name):
def timedecorator(f):
cdef int L, i


print("Running", name)
for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:
start = time.clock()
f(L)
end = time.clock()
print(format((end-start) / loops * 1e6, "2f"), end=" ")
sys.stdout.flush()


print("μs")
return timedecorator


print()
print("INITIALISATIONS")
loops = 100000


@timefunc("cpython.array buffer")
def _(int L):
cdef int i
cdef array[double] arr, template = array('d')


for i in range(loops):
arr = clone(template, L, False)


# Prevents dead code elimination
str(arr[0])


@timefunc("cpython.array memoryview")
def _(int L):
cdef int i
cdef double[::1] arr
cdef array template = array('d')


for i in range(loops):
arr = clone(template, L, False)


# Prevents dead code elimination
str(arr[0])


@timefunc("cpython.array raw C type")
def _(int L):
cdef int i
cdef array arr, template = array('d')


for i in range(loops):
arr = clone(template, L, False)


# Prevents dead code elimination
str(arr[0])


@timefunc("numpy.empty_like memoryview")
def _(int L):
cdef int i
cdef double[::1] arr
template = numpy.empty((L,), dtype='double')


for i in range(loops):
arr = numpy.empty_like(template)


# Prevents dead code elimination
str(arr[0])


@timefunc("malloc")
def _(int L):
cdef int i
cdef double* arrptr


for i in range(loops):
arrptr = <double*> malloc(sizeof(double) * L)
free(arrptr)


# Prevents dead code elimination
str(arrptr[0])


@timefunc("malloc memoryview")
def _(int L):
cdef int i
cdef double* arrptr
cdef double[::1] arr


for i in range(loops):
arrptr = <double*> malloc(sizeof(double) * L)
arr = <double[:L]>arrptr
free(arrptr)


# Prevents dead code elimination
str(arr[0])


@timefunc("cvarray memoryview")
def _(int L):
cdef int i
cdef double[::1] arr


for i in range(loops):
arr = cvarray((L,),sizeof(double),'d')


# Prevents dead code elimination
str(arr[0])






print()
print("ITERATING")
loops = 1000


@timefunc("cpython.array buffer")
def _(int L):
cdef int i
cdef array[double] arr = clone(array('d'), L, False)


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


# Prevents dead-code elimination
str(d)


@timefunc("cpython.array memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = clone(array('d'), L, False)


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


# Prevents dead-code elimination
str(d)


@timefunc("cpython.array raw C type")
def _(int L):
cdef int i
cdef array arr = clone(array('d'), L, False)


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


# Prevents dead-code elimination
str(d)


@timefunc("numpy.empty_like memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = numpy.empty((L,), dtype='double')


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


# Prevents dead-code elimination
str(d)


@timefunc("malloc")
def _(int L):
cdef int i
cdef double* arrptr = <double*> malloc(sizeof(double) * L)


cdef double d
for i in range(loops):
for i in range(L):
d = arrptr[i]


free(arrptr)


# Prevents dead-code elimination
str(d)


@timefunc("malloc memoryview")
def _(int L):
cdef int i
cdef double* arrptr = <double*> malloc(sizeof(double) * L)
cdef double[::1] arr = <double[:L]>arrptr


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


free(arrptr)


# Prevents dead-code elimination
str(d)


@timefunc("cvarray memoryview")
def _(int L):
cdef int i
cdef double[::1] arr = cvarray((L,),sizeof(double),'d')


cdef double d
for i in range(loops):
for i in range(L):
d = arr[i]


# Prevents dead-code elimination
str(d)

Output:

INITIALISATIONS
Running cpython.array buffer
0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs
Running cpython.array memoryview
0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs
Running cpython.array raw C type
0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs
Running numpy.empty_like memoryview
1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs
Running malloc
0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs
Running malloc memoryview
1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs
Running cvarray memoryview
1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs


ITERATING
Running cpython.array buffer
0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs
Running cpython.array memoryview
0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs
Running cpython.array raw C type
0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs
Running numpy.empty_like memoryview
0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs
Running malloc
0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs
Running malloc memoryview
0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs
Running cvarray memoryview
0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs

(The reason for the "iterations" benchmark is that some methods have surprisingly different characteristics in this respect.)

In order of initialisation speed:

malloc: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for...

cpython.array raw C type: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick:

arr.data.as_doubles[i]

which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for malloc, being basically a pretty reference-counted version!

cpython.array buffer: Coming in at only three to four times the setup time of malloc, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to the boundscheck and wraparound directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice.

cpython.array memoryview: This is now an order of magnitude slower than malloc to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unless boundscheck or wraparound are on (in which case cpython.array buffer might be a more compelling tradeoff).

The rest. The only one worth anything is numpy's, due to the many fun methods attached to the objects. That's it, though.

As a follow up to Veedrac's answer: be aware using the memoryview support of cpython.array with python 2.7 appears to lead to memory leaks currently. This seems to be a long-standing issue as it is mentioned on the cython-users mailing list here in a post from November 2012. Running Veedrac's benchmark scrip with Cython version 0.22 with both Python 2.7.6 and Python 2.7.9 leads to a large memory leak on when initialising a cpython.array using either a buffer or memoryview interface. No memory leaks occur when running the script with Python 3.4. I've filed a bug report on this to the Cython developers mailing list.