在 numpy 数组中删除行

我有一个这样的数组:

ANOVAInputMatrixValuesArray = [[ 0.96488889, 0.73641667, 0.67521429, 0.592875,
0.53172222], [ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]]

注意,其中一行的末尾有一个零值。我想删除任何包含零的行,同时在所有单元格中保留包含非零值的行。

但是每次填充数组时,数组的行数不同,而且每次零都位于不同的行中。

我得到每一行中非零元素的数量,代码如下:

NumNonzeroElementsInRows    = (ANOVAInputMatrixValuesArray != 0).sum(1)

对于上面的数组,NumNonzeroElementsInRows包含: [54]

五表示第0行中的所有可能值都是非零的,而四表示第1行中的一个可能值是零。

因此,我尝试使用以下代码行来查找和删除包含零值的行。

for q in range(len(NumNonzeroElementsInRows)):
if NumNonzeroElementsInRows[q] < NumNonzeroElementsInRows.max():
p.delete(ANOVAInputMatrixValuesArray, q, axis=0)

但是出于某种原因,这段代码似乎没有做任何事情,即使执行大量的 print 命令表明所有的变量似乎都正确地填充到代码之前。

必须有一些简单的方法来简单地“删除任何包含零值的行”

有人能告诉我要编写什么样的代码来完成这个任务吗?

311566 次浏览

This is similar to your original approach, and will use less space than unutbu's answer, but I suspect it will be slower.

>>> import numpy as np
>>> p = np.array([[1.5, 0], [1.4,1.5], [1.6, 0], [1.7, 1.8]])
>>> p
array([[ 1.5,  0. ],
[ 1.4,  1.5],
[ 1.6,  0. ],
[ 1.7,  1.8]])
>>> nz = (p == 0).sum(1)
>>> q = p[nz == 0, :]
>>> q
array([[ 1.4,  1.5],
[ 1.7,  1.8]])

By the way, your line p.delete() doesn't work for me - ndarrays don't have a .delete attribute.

Here's a one liner (yes, it is similar to user333700's, but a little more straightforward):

>>> import numpy as np
>>> arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],
[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
>>> print arr[arr.all(1)]
array([[ 0.96488889,  0.73641667,  0.67521429,  0.592875  ,  0.53172222]])

By the way, this method is much, much faster than the masked array method for large matrices. For a 2048 x 5 matrix, this method is about 1000x faster.

By the way, user333700's method (from his comment) was slightly faster in my tests, though it boggles my mind why.

numpy provides a simple function to do the exact same thing: supposing you have a masked array 'a', calling numpy.ma.compress_rows(a) will delete the rows containing a masked value. I guess this is much faster this way...

The simplest way to delete rows and columns from arrays is the numpy.delete method.

Suppose I have the following array x:

x = array([[1,2,3],
[4,5,6],
[7,8,9]])

To delete the first row, do this:

x = numpy.delete(x, (0), axis=0)

To delete the third column, do this:

x = numpy.delete(x,(2), axis=1)

So you could find the indices of the rows which have a 0 in them, put them in a list or a tuple and pass this as the second argument of the function.

I might be too late to answer this question, but wanted to share my input for the benefit of the community. For this example, let me call your matrix 'ANOVA', and I am assuming you're just trying to remove rows from this matrix with 0's only in the 5th column.

indx = []
for i in range(len(ANOVA)):
if int(ANOVA[i,4]) == int(0):
indx.append(i)


ANOVA = [x for x in ANOVA if not x in indx]
import numpy as np
arr = np.array([[ 0.96488889, 0.73641667, 0.67521429, 0.592875, 0.53172222],[ 0.78008333, 0.5938125, 0.481, 0.39883333, 0.]])
print(arr[np.where(arr != 0.)])