连接 Numpy 数组而不复制

在 Numpy,我可以将两个数组端到端与 np.appendnp.concatenate连接起来:

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> Z = np.append(X, Y, axis=0)
>>> Z
array([[ 1,  2,  3],
[-1, -2, -3],
[ 4,  5,  6]])

但它们会复制它们的输入数组:

>>> Z[0,:] = 0
>>> Z
array([[ 0,  0,  0],
[-1, -2, -3],
[ 4,  5,  6]])
>>> X
array([[1, 2, 3]])

有没有一种方法可以将两个数组连接到一个 风景中,即不复制? 这需要 np.ndarray子类吗?

50183 次浏览

The memory belonging to a Numpy array must be contiguous. If you allocated the arrays separately, they are randomly scattered in memory, and there is no way to represent them as a view Numpy array.

If you know beforehand how many arrays you need, you can instead start with one big array that you allocate beforehand, and have each of the small arrays be a view to the big array (e.g. obtained by slicing).

Not really elegant at all but you can get close to what you want using a tuple to store pointers to the arrays. Now I have no idea how I would use it in the case but I have done things like this before.

>>> X = np.array([[1,2,3]])
>>> Y = np.array([[-1,-2,-3],[4,5,6]])
>>> z = (X, Y)
>>> z[0][:] = 0
>>> z
(array([[0, 0, 0]]), array([[-1, -2, -3],
[ 4,  5,  6]]))
>>> X
array([[0, 0, 0]])

You may create an array of arrays, like:

>>> from numpy import *
>>> a = array([1.0, 2.0, 3.0])
>>> b = array([4.0, 5.0])
>>> c = array([a, b])
>>> c
array([[ 1.  2.  3.], [ 4.  5.]], dtype=object)
>>> a[0] = 100.0
>>> a
array([ 100.,    2.,    3.])
>>> c
array([[ 100.    2.    3.], [ 4.  5.]], dtype=object)
>>> c[0][1] = 200.0
>>> a
array([ 100.,  200.,    3.])
>>> c
array([[ 100.  200.    3.], [ 4.  5.]], dtype=object)
>>> c *= 1000
>>> c
array([[ 100000.  200000.    3000.], [ 4000.  5000.]], dtype=object)
>>> a
array([ 100.,  200.,    3.])
>>> # Oops! Copies were made...

The problem is that it creates copies on broadcast operations (sounds like a bug).

Just initialize the array before you fill it with data. If you want you can allocate more space than needed and it will not take up more RAM because of the way numpy works.

A = np.zeros(R,C)
A[row] = [data]

The memory is used only once data is put into the array. Creating a new array from concatenating two will never finish on a dataset of any size, i.e. dataset > 1GB or so.

The answer is based on my other answer in Reference to ndarray rows in ndarray

X = np.array([[1,2,3]])
Y = np.array([[-1,-2,-3],[4,5,6]])
Z = np.array([None, None, None])
Z[0] = X[0]
Z[1] = Y[0]
Z[2] = Y[1]


Z[0][0] = 5 # X would be changed as well


print(X)
Output:
array([[5, 2, 3]])


# Let's make it a function!
def concat(X, Y, copy=True):
"""Return an array of references if copy=False"""
if copy is True:  # deep copy
return np.append(X, Y, axis=0)
len_x, len_y = len(X), len(Y)
ret = np.array([None for _ in range(len_x + len_y)])
for i in range(len_x):
ret[i] = X[i]
for j in range(len_y):
ret[len_x + j] = Y[j]
return ret

I had the same problem and ended up doing it reversed, after concatenating normally (with copy) I reassigned the original arrays to become views on the concatenated one:

import numpy as np


def concat_no_copy(arrays):
""" Concats the arrays and returns the concatenated array
in addition to the original arrays as views of the concatenated one.


Parameters:
-----------
arrays: list
the list of arrays to concatenate
"""
con = np.concatenate(arrays)


viewarrays = []
for i, arr in enumerate(arrays):
arrnew = con[sum(len(a) for a in arrays[:i]):
sum(len(a) for a in arrays[:i + 1])]
viewarrays.append(arrnew)
assert all(arr == arrnew)


# return the view arrays, replace the old ones with these
return con, viewarrays

You can test it as follows:

def test_concat_no_copy():
arr1 = np.array([0, 1, 2, 3, 4])
arr2 = np.array([5, 6, 7, 8, 9])
arr3 = np.array([10, 11, 12, 13, 14])


arraylist = [arr1, arr2, arr3]


con, newarraylist = concat_no_copy(arraylist)


assert all(con == np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14]))


for old, new in zip(arraylist, newarraylist):
assert all(old == new)