How to calculate probability in a normal distribution given mean & standard deviation?

How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution in Python

Just wondering if there is a library function call will allow you to do this. In my imagine it would like this:

nd = NormalDistribution(mu=100, std=12)
p = nd.prob(98)

There is a similar question in Perl: How can I compute the probability at a point given a normal distribution in Perl?. But I didn't see one in Python.

Numpy has a random.normal function, but it's like sampling, not exactly what I want.

279139 次浏览

scipy.stats里有一个:

>>> import scipy.stats
>>> scipy.stats.norm(0, 1)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(0, 1).pdf(0)
0.3989422804014327
>>> scipy.stats.norm(0, 1).cdf(0)
0.5
>>> scipy.stats.norm(100, 12)
<scipy.stats.distributions.rv_frozen object at 0x928352c>
>>> scipy.stats.norm(100, 12).pdf(98)
0.032786643008494994
>>> scipy.stats.norm(100, 12).cdf(98)
0.43381616738909634
>>> scipy.stats.norm(100, 12).cdf(100)
0.5

[需要注意的一件事——仅仅是一个提示——是参数传递的范围有点宽。由于代码的设置方式,如果您意外地编写了 scipy.stats.norm(mean=100, std=12)而不是 scipy.stats.norm(100, 12)scipy.stats.norm(loc=100, scale=12),那么它将接受它,但是默认地丢弃那些额外的关键字参数并给出缺省值(0,1)。]

Stats 是一个很棒的模块

import math
def normpdf(x, mean, sd):
var = float(sd)**2
denom = (2*math.pi*var)**.5
num = math.exp(-(float(x)-float(mean))**2/(2*var))
return num/denom

这里使用的公式是: http://en.wikipedia.org/wiki/Normal_distribution#Probability_density_function

测试:

>>> normpdf(7,5,5)
0.07365402806066466
>>> norm(5,5).pdf(7)
0.073654028060664664

您可以只使用内置在数学库中的错误函数,如它们的 网站所述。

答案中引用的维基百科公式不能用来计算正常概率。你必须用这个公式写一个数值积分近似函数来计算概率。

That formula computes the value for the probability density function. Since the normal distribution is continuous, you have to compute an integral to get probabilities. The wikipedia site mentions the CDF, which does not have a closed form for the normal distribution.

这里是 更多信息。 First you are dealing with a frozen distribution (frozen in this case means its parameters are set to specific values). To create a frozen distribution:

import scipy.stats
scipy.stats.norm(loc=100, scale=12)
#where loc is the mean and scale is the std dev
#if you wish to pull out a random number from your distribution
scipy.stats.norm.rvs(loc=100, scale=12)


#To find the probability that the variable has a value LESS than or equal
#let's say 113, you'd use CDF cumulative Density Function
scipy.stats.norm.cdf(113,100,12)
Output: 0.86066975255037792
#or 86.07% probability


#To find the probability that the variable has a value GREATER than or
#equal to let's say 125, you'd use SF Survival Function
scipy.stats.norm.sf(125,100,12)
Output: 0.018610425189886332
#or 1.86%


#To find the variate for which the probability is given, let's say the
#value which needed to provide a 98% probability, you'd use the
#PPF Percent Point Function
scipy.stats.norm.ppf(.98,100,12)
Output: 124.64498692758187

I wrote this program to do the math for you. Just enter in the summary statistics. No need to provide an array:

单样本 Z 检验人口比例:

To do this for mean rather than proportion, change the formula for z accordingly

编辑:
Here is the content from the link:

import scipy.stats as stats
import math


def one_sample_ztest_pop_proportion(tail, p, pbar, n, alpha):
#Calculate test stat


sigma = math.sqrt((p*(1-p))/(n))
z = round((pbar - p) / sigma, 2)


if tail == 'lower':
pval = round(stats.norm(p, sigma).cdf(pbar),4)
print("Results for a lower tailed z-test: ")




elif tail == 'upper':
pval = round(1 - stats.norm(p, sigma).cdf(pbar),4)
print("Results for an upper tailed z-test: ")




elif tail == 'two':
pval = round(stats.norm(p, sigma).cdf(pbar)*2,4)
print("Results for a two tailed z-test: ")




#Print test results
print("Test statistic = {}".format(z))
print("P-value = {}".format(pval))
print("Confidence = {}".format(alpha))


#Compare p-value to confidence level
if pval <= alpha:
print("{} <=  {}. Reject the null hypothesis.".format(pval, alpha))
else:
print("{} > {}. Do not reject the null hypothesis.".format(pval, alpha))




#one_sample_ztest_pop_proportion('upper', .20, .25, 400, .05)


#one_sample_ztest_pop_proportion('two', .64, .52, 100, .05)

Python 3.8开始,标准库提供 NormalDist对象作为 statistics模块的一部分。

它可以用来获得给定 刻薄(mu)和 标准差(sigma)的 概率密度函数(pdf-随机样本 X 接近给定值 x 的可能性) :

from statistics import NormalDist


NormalDist(mu=100, sigma=12).pdf(98)
# 0.032786643008494994

还要注意的是,NormalDist对象也提供了 累积分布函数(cdf-随机样本 X 小于或等于 x 的概率) :

NormalDist(mu=100, sigma=12).cdf(98)
# 0.43381616738909634

如果你想知道 x 的平均值为1,标准差为2,x 的概率为[0.5,2]

import scipy.stats
scipy.stats.norm(1, 2).cdf(2) - scipy.stats.norm(1,2).cdf(0.5)

请注意,概率概率密度 pdf()不同,前面的一些答案参考了 概率密度 pdf()概率表示变量具有特定值的概率,而 概率密度表示变量接近特定值的概率,这意味着在一个范围内的概率。因此,为了得到概率,你需要计算给定区间内概率密度函数的积分。作为一个近似值,你可以简单地将概率密度乘以你感兴趣的时间间隔,就会得到实际的概率。

import numpy as np
from scipy.stats import norm


data_start = -10
data_end = 10
data_points = 21
data = np.linspace(data_start, data_end, data_points)


point_of_interest = 5
mu = np.mean(data)
sigma = np.std(data)
interval = (data_end - data_start) / (data_points - 1)
probability = norm.pdf(point_of_interest, loc=mu, scale=sigma) * interval

上面的代码将给出这个变量在 -10到10之间的正态分布中的精确值为5的概率,其中有21个数据点(意味着间隔为1)。您可以使用固定的间隔值,这取决于您希望实现的结果。

我想说的是,提问者是在问“如何在给定平均值和标准差的正态分布中计算给定数据点的可能性?”而不是“如何在给定均值和标准差的正态分布中计算概率?”.

对于“概率”,它必须介于0和1之间,但对于“可能性”,它必须是非负的(不一定介于0和1之间)。

您可以使用 Multi 变量 _ Normal中的 multivariate_normal.pdf(x, mean= mean_vec, cov=cov_matrix)来计算它。