Python Scipy 中的两样本 Kolmogorov-Smirnov 检验

我不知道怎么在 Scipy 做双样本 KS 测试。

在阅读文档 Scipy Kstest之后

我可以看到如何测试分布与标准正态分布相同的地方

from scipy.stats import kstest
import numpy as np


x = np.random.normal(0,1,1000)
test_stat = kstest(x, 'norm')
#>>> test_stat
#(0.021080234718821145, 0.76584491300591395)

这意味着在 p 值为0.76时,我们不能拒绝两个分布相同的零假设。

然而,我想比较两个分布,看看我是否可以拒绝它们是相同的零假设,比如:

from scipy.stats import kstest
import numpy as np


x = np.random.normal(0,1,1000)
z = np.random.normal(1.1,0.9, 1000)

测试 x 和 z 是否相同

我尝试了天真的说法:

test_stat = kstest(x, z)

得到如下错误:

TypeError: 'numpy.ndarray' object is not callable

有没有一种方法可以在 Python 中进行两个样本的 KS 测试? 如果有,我应该怎么做?

先谢谢你

102760 次浏览

You are using the one-sample KS test. You probably want the two-sample test ks_2samp:

>>> from scipy.stats import ks_2samp
>>> import numpy as np
>>>
>>> np.random.seed(12345678)
>>> x = np.random.normal(0, 1, 1000)
>>> y = np.random.normal(0, 1, 1000)
>>> z = np.random.normal(1.1, 0.9, 1000)
>>>
>>> ks_2samp(x, y)
Ks_2sampResult(statistic=0.022999999999999909, pvalue=0.95189016804849647)
>>> ks_2samp(x, z)
Ks_2sampResult(statistic=0.41800000000000004, pvalue=3.7081494119242173e-77)

Results can be interpreted as following:

  1. You can either compare the statistic value given by python to the KS-test critical value table according to your sample size. When statistic value is higher than the critical value, the two distributions are different.

  2. Or you can compare the p-value to a level of significance a, usually a=0.05 or 0.01 (you decide, the lower a is, the more significant). If p-value is lower than a, then it is very probable that the two distributions are different.

This is what the scipy docs say:

If the K-S statistic is small or the p-value is high, then we cannot reject the hypothesis that the distributions of the two samples are the same.

Cannot reject doesn't mean we confirm.