使用NumPy构建两个数组的所有组合的数组

我试图运行6参数函数的参数空间来研究它的数值行为,然后再尝试用它做任何复杂的事情,所以我正在寻找一种有效的方法来做到这一点。

我的函数将6维NumPy数组中给出的浮点值作为输入。我最初想做的是:

首先,我创建了一个函数,该函数接受2个数组,并使用这两个数组中的所有值组合生成一个数组:

from numpy import *
def comb(a,b):
c = []
for i in a:
for j in b:
c.append(r_[i,j])
return c

然后,我使用reduce()将其应用于同一数组的M个副本:

def combs(a,m):
return reduce(comb,[a]*m)

最后,我这样评价我的功能:

values = combs(np.arange(0,1,0.1),6)
for val in values:
print F(val)

这是可行的,但太慢了。我知道参数的空间很大,但这不应该这么慢。在本例中,我只对10个6(一百万)点进行了采样,仅创建ABC0__数组就花费了超过15秒的时间。

你知道用NumPy做这件事有什么更有效的方法吗?

如果有必要,我可以修改函数__abc0接受参数的方式。

218774 次浏览

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements WITHOUT repetitions and independent of order; that's not what your code appears to be doing, but I can't tell whether that's because your code is buggy or because you're using the wrong terminology).

If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:

for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
print F(val)

All of these iterators yield tuples, not lists or numpy arrays, so if your F is picky about getting specifically a numpy array you'll have to accept the extra overhead of constructing or clearing and re-filling one at each step.

Here's a pure-numpy implementation. It's about 5× faster than using itertools.

Python 3:

import numpy as np


def cartesian(arrays, out=None):
"""
Generate a cartesian product of input arrays.


Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.


Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.


Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])


"""


arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype


n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)


#m = n / arrays[0].size
m = int(n / arrays[0].size)
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in range(1, arrays[0].size):
#for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out

Python 2:


import numpy as np


def cartesian(arrays, out=None):
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype


n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)


m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):

import numpy
my_grid = numpy.mgrid[[slice(0,1,0.1)]*6]

You can do something like this

import numpy as np


def cartesian_coord(*arrays):
grid = np.meshgrid(*arrays)
coord_list = [entry.ravel() for entry in grid]
points = np.vstack(coord_list).T
return points


a = np.arange(4)  # fake data
print(cartesian_coord(*6*[a])

which gives

array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 2],
...,
[3, 3, 3, 3, 3, 1],
[3, 3, 3, 3, 3, 2],
[3, 3, 3, 3, 3, 3]])

The following numpy implementation should be approx. 2x the speed of the given answer:

def cartesian2(arrays):
arrays = [np.asarray(a) for a in arrays]
shape = (len(x) for x in arrays)


ix = np.indices(shape, dtype=int)
ix = ix.reshape(len(arrays), -1).T


for n, arr in enumerate(arrays):
ix[:, n] = arrays[n][ix[:, n]]


return ix

Here's yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It's about 20% slower than the original answer, and it's based on np.meshgrid.

def cartesian(*arrays):
mesh = np.meshgrid(*arrays)  # standard numpy meshgrid
dim = len(mesh)  # number of dimensions
elements = mesh[0].size  # number of elements, any index will do
flat = np.concatenate(mesh).ravel()  # flatten the whole meshgrid
reshape = np.reshape(flat, (dim, elements)).T  # reshape and transpose
return reshape

For example,

x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)

gives

[[0 0 0 0 0]
[0 0 0 0 1]
[0 0 0 0 2]
...,
[2 2 2 2 0]
[2 2 2 2 1]
[2 2 2 2 2]]

In newer version of numpy (>1.8.x), numpy.meshgrid() provides a much faster implementation:

@pv's solution

In [113]:


%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:


cartesian(([1, 2, 3], [4, 5], [6, 7]))


Out[114]:
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])

numpy.meshgrid() use to be 2D only, now it is capable of ND. In this case, 3D:

In [115]:


%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:


np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)


Out[116]:
array([[1, 4, 6],
[1, 5, 6],
[2, 4, 6],
[2, 5, 6],
[3, 4, 6],
[3, 5, 6],
[1, 4, 7],
[1, 5, 7],
[2, 4, 7],
[2, 5, 7],
[3, 4, 7],
[3, 5, 7]])

Note that the order of the final resultant is slightly different.

For a pure numpy implementation of Cartesian product of 1D arrays (or flat python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired ouput:

 def cartprod(*arrays):
N = len(arrays)
return transpose(meshgrid(*arrays, indexing='ij'),
roll(arange(N + 1), -1)).reshape(-1, N)

Note this has the convention of last axis changing fastest ("C style" or "row-major").

In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]:
array([[  1,   4, 100,  -5],
[  1,   4, 100,  -4],
[  1,   4, 200,  -5],
[  1,   4, 200,  -4],
[  1,   4, 300,  -5],
[  1,   4, 300,  -4],
[  1,   4, 400,  -5],
[  1,   4, 400,  -4],
[  1,   8, 100,  -5],
[  1,   8, 100,  -4],
[  1,   8, 200,  -5],
[  1,   8, 200,  -4],
[  1,   8, 300,  -5],
[  1,   8, 300,  -4],
[  1,   8, 400,  -5],
[  1,   8, 400,  -4],
[  2,   4, 100,  -5],
[  2,   4, 100,  -4],
[  2,   4, 200,  -5],
[  2,   4, 200,  -4],
[  2,   4, 300,  -5],
[  2,   4, 300,  -4],
[  2,   4, 400,  -5],
[  2,   4, 400,  -4],
[  2,   8, 100,  -5],
[  2,   8, 100,  -4],
[  2,   8, 200,  -5],
[  2,   8, 200,  -4],
[  2,   8, 300,  -5],
[  2,   8, 300,  -4],
[  2,   8, 400,  -5],
[  2,   8, 400,  -4],
[  3,   4, 100,  -5],
[  3,   4, 100,  -4],
[  3,   4, 200,  -5],
[  3,   4, 200,  -4],
[  3,   4, 300,  -5],
[  3,   4, 300,  -4],
[  3,   4, 400,  -5],
[  3,   4, 400,  -4],
[  3,   8, 100,  -5],
[  3,   8, 100,  -4],
[  3,   8, 200,  -5],
[  3,   8, 200,  -4],
[  3,   8, 300,  -5],
[  3,   8, 300,  -4],
[  3,   8, 400,  -5],
[  3,   8, 400,  -4]])

If you want to change the first axis fastest ("FORTRAN style" or "column-major"), just change the order parameter of reshape() like this: reshape((-1, N), order='F')

you can use np.array(itertools.product(a, b))

Pandas merge offers a naive, fast solution to the problem:

# given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]


# get dfs with same, constant index
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x)))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y)))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z)))


# get all permutations stored in a new df
df = pd.merge(x, pd.merge(y, z, left_index=True, right_index=True),
left_index=True, right_index=True)