查找一个数字数组的 k 个最小值的索引

为了找到最小值的索引,我可以使用 argmin:

import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin()     # 4 because A[4] = 0.1

但是我怎样才能找到 K-最小值的指数呢?

我要找的是这样的东西:

print A.argmin(numberofvalues=3)
# [4, 0, 7]  because A[4] <= A[0] <= A[7] <= all other A[i]

注意: 在我的用例中,A 有约10000到100000个值,我只对 k = 10个最小值的索引感兴趣。K 永远不会大于10。

128421 次浏览

You can use numpy.argsort with slicing

>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.

import numpy as np


A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3


idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]

This returns the k-smallest values. Note that these may not be in sorted order.

print(A[idx[:k]])
# [ 0.1  1.   1.5]

To obtain the k-largest values use

idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]


A[idx[-k:]]
# [  9.  17.  17.]

WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest. That won't always work. For example, these are NOT the 3 largest values in x:

x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70,  80, 100])

Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.

In [2]: x = np.random.randn(100000)


In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop


In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop


In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.

To retrieve the k largest, simply pass in -k.

def get_indices_of_k_smallest(arr, k):
idx = np.argpartition(arr.ravel(), k)
return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
# if you want it in a list of indices . . .
# return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()

Example:

r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)


indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
#  array([3, 2, 0, 1], dtype=int64),
#  array([3, 0, 3, 3], dtype=int64))


arr[indices]
# array([ 4, 31, 54, 77])


%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)