如何绘制正态分布图

给定均值和方差,是否有一个简单的函数调用来绘制正态分布?

384340 次浏览
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math


mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.plot(x, stats.norm.pdf(x, mu, sigma))
plt.show()

gass distro, mean is 0 variance 1

我不认为有一个函数可以在一次调用中完成所有这些工作。但是你可以在 scipy.stats中找到高斯概率密度函数。

所以我能想到的最简单的方法就是:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm


# Plot between -10 and 10 with .001 steps.
x_axis = np.arange(-10, 10, 0.001)
# Mean = 0, SD = 2.
plt.plot(x_axis, norm.pdf(x_axis,0,2))
plt.show()

资料来源:

Unutbu 回答正确。 但是因为我们的平均值可以大于或小于零,我仍然想改变这个:

x = np.linspace(-3 * sigma, 3 * sigma, 100)

回到这里:

x = np.linspace(-3 * sigma + mean, 3 * sigma + mean, 100)

如果您喜欢使用一步一步的方法,您可以考虑下面这样的解决方案

import numpy as np
import matplotlib.pyplot as plt


mean = 0; std = 1; variance = np.square(std)
x = np.arange(-5,5,.01)
f = np.exp(-np.square(x-mean)/2*variance)/(np.sqrt(2*np.pi*variance))


plt.plot(x,f)
plt.ylabel('gaussian distribution')
plt.show()

你可以很容易地得到 cdf

    import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
import scipy.stats


def setGridLine(ax):
#http://jonathansoma.com/lede/data-studio/matplotlib/adding-grid-lines-to-a-matplotlib-chart/
ax.set_axisbelow(True)
ax.minorticks_on()
ax.grid(which='major', linestyle='-', linewidth=0.5, color='grey')
ax.grid(which='minor', linestyle=':', linewidth=0.5, color='#a6a6a6')
ax.tick_params(which='both', # Options for both major and minor ticks
top=False, # turn off top ticks
left=False, # turn off left ticks
right=False,  # turn off right ticks
bottom=False) # turn off bottom ticks


data1 = np.random.normal(0,1,1000000)
x=np.sort(data1)
y=np.arange(x.shape[0])/(x.shape[0]+1)


f2 = scipy.interpolate.interp1d(x, y,kind='linear')
x2 = np.linspace(x[0],x[-1],1001)
y2 = f2(x2)


y2b = np.diff(y2)/np.diff(x2)
x2b=(x2[1:]+x2[:-1])/2.


f3 = scipy.interpolate.interp1d(x, y,kind='cubic')
x3 = np.linspace(x[0],x[-1],1001)
y3 = f3(x3)


y3b = np.diff(y3)/np.diff(x3)
x3b=(x3[1:]+x3[:-1])/2.


bins=np.arange(-4,4,0.1)
bins_centers=0.5*(bins[1:]+bins[:-1])
cdf = scipy.stats.norm.cdf(bins_centers)
pdf = scipy.stats.norm.pdf(bins_centers)


plt.rcParams["font.size"] = 18
fig, ax = plt.subplots(3,1,figsize=(10,16))
ax[0].set_title("cdf")
ax[0].plot(x,y,label="data")
ax[0].plot(x2,y2,label="linear")
ax[0].plot(x3,y3,label="cubic")
ax[0].plot(bins_centers,cdf,label="ans")


ax[1].set_title("pdf:linear")
ax[1].plot(x2b,y2b,label="linear")
ax[1].plot(bins_centers,pdf,label="ans")


ax[2].set_title("pdf:cubic")
ax[2].plot(x3b,y3b,label="cubic")
ax[2].plot(bins_centers,pdf,label="ans")


for idx in range(3):
ax[idx].legend()
setGridLine(ax[idx])


plt.show()
plt.clf()
plt.close()

我刚刚回到这里,在尝试上面的示例时,matplotlib.mlab 给了我错误消息 MatplotlibDeprecationWarning: scipy.stats.norm.pdf,所以我必须安装 scypy。现在的样本是:

%matplotlib inline
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats




mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
plt.plot(x, scipy.stats.norm.pdf(x, mu, sigma))


plt.show()

改用海运 我使用的是1000个值中的平均值 = 5标准差 = 3的海运量图

value = np.random.normal(loc=5,scale=3,size=1000)
sns.distplot(value)

你会得到一条正态分布曲线

我认为设置高度很重要,所以创建了这个函数:

def my_gauss(x, sigma=1, h=1, mid=0):
from math import exp, pow
variance = pow(sigma, 2)
return h * exp(-pow(x-mid, 2)/(2*variance))

其中标准差为 sigma,高度为 h,平均值为 mid

致:

plt.close("all")
x = np.linspace(-20, 20, 101)
yg = [my_gauss(xi) for xi in x]

下面是使用不同高度和偏差的结果:

enter image description here

import math
import matplotlib.pyplot as plt
import numpy
import pandas as pd




def normal_pdf(x, mu=0, sigma=1):
sqrt_two_pi = math.sqrt(math.pi * 2)
return math.exp(-(x - mu) ** 2 / 2 / sigma ** 2) / (sqrt_two_pi * sigma)




df = pd.DataFrame({'x1': numpy.arange(-10, 10, 0.1), 'y1': map(normal_pdf, numpy.arange(-10, 10, 0.1))})


plt.plot('x1', 'y1', data=df, marker='o', markerfacecolor='blue', markersize=5, color='skyblue', linewidth=1)
plt.show()

enter image description here

对于我来说,如果你正在尝试绘制一个特定的 pdf 文件,那么这个工作非常好

theta1 = {
"a": 0.5,
"cov" : 1,
"mean" : 0
}
x = np.linspace(start = 0, stop = 1000, num = 1000)
pdf = stats.norm.pdf(x, theta1['mean'], theta1['cov']) + theta2['a']
sns.lineplot(x,pdf)