你如何在 Numpy 找到 IQR?

是否有一个固定的 Numpy/scypy 函数来找到四分差?我可以很容易做到这一点自己,但 mean()存在,这基本上是 sum/len..。

def IQR(dist):
return np.percentile(dist, 75) - np.percentile(dist, 25)
91515 次浏览

np.percentile takes multiple percentile arguments, and you are slightly better off doing:

q75, q25 = np.percentile(x, [75 ,25])
iqr = q75 - q25

or

iqr = np.subtract(*np.percentile(x, [75, 25]))

than making two calls to percentile:

In [8]: x = np.random.rand(1e6)


In [9]: %timeit q75, q25 = np.percentile(x, [75 ,25]); iqr = q75 - q25
10 loops, best of 3: 24.2 ms per loop


In [10]: %timeit iqr = np.subtract(*np.percentile(x, [75, 25]))
10 loops, best of 3: 24.2 ms per loop


In [11]: %timeit iqr = np.percentile(x, 75) - np.percentile(x, 25)
10 loops, best of 3: 33.7 ms per loop

There is now an iqr function in scipy.stats. It is available as of scipy 0.18.0. My original intent was to add it to numpy, but it was considered too domain-specific.

You may be better off just using Jaime's answer, since the scipy code is just an over-complicated version of the same.

Ignore this if Jaime's answer works for your case. But if not, according to this answer, to find the exact values of 1st and 3rd quartiles, you should consider doing something like:

samples = sorted([28, 12, 8, 27, 16, 31, 14, 13, 19, 1, 1, 22, 13])


def find_median(sorted_list):
indices = []


list_size = len(sorted_list)
median = 0


if list_size % 2 == 0:
indices.append(int(list_size / 2) - 1)  # -1 because index starts from 0
indices.append(int(list_size / 2))


median = (sorted_list[indices[0]] + sorted_list[indices[1]]) / 2
pass
else:
indices.append(int(list_size / 2))


median = sorted_list[indices[0]]
pass


return median, indices
pass


median, median_indices = find_median(samples)
Q1, Q1_indices = find_median(samples[:median_indices[0]])
Q2, Q2_indices = find_median(samples[median_indices[-1] + 1:])


IQR = Q3 - Q1


quartiles = [Q1, median, Q2]

Code taken from the referenced answer.