Python 中 numpy.Random.rand 与 numpy.Random.randn 之间的差异

numpy.random.randnumpy.random.randn有什么不同?

从文档中,我知道它们之间的唯一区别是每个数字来自的概率分布,但是所使用的总体结构(维度)和数据类型(浮点数)是相同的。因为这个,我很难调试神经网络。

具体来说,我正在尝试重新实现 迈克尔 · 尼尔森的《神经网络与深度学习》中提供的神经网络。原始代码可以找到 给你。我的实现与原来的相同; 但是,我在 init函数中用 numpy.random.rand而不是原来的 numpy.random.randn函数定义和初始化了权重和偏差。

但是,使用 random.rand初始化 weights and biases的代码不起作用。网络不会学习,权重和偏见也不会改变。

这两个随机函数之间的区别是什么?

73146 次浏览

First, as you see from the documentation numpy.random.randn generates samples from the normal distribution, while numpy.random.rand from a uniform distribution (in the range [0,1)).

Second, why did the uniform distribution not work? The main reason is the activation function, especially in your case where you use the sigmoid function. The plot of the sigmoid looks like the following:

enter image description here

So you can see that if your input is away from 0, the slope of the function decreases quite fast and as a result you get a tiny gradient and tiny weight update. And if you have many layers - those gradients get multiplied many times in the back pass, so even "proper" gradients after multiplications become small and stop making any influence. So if you have a lot of weights which bring your input to those regions you network is hardly trainable. That's why it is a usual practice to initialize network variables around zero value. This is done to ensure that you get reasonable gradients (close to 1) to train your net.

However, uniform distribution is not something completely undesirable, you just need to make the range smaller and closer to zero. As one of good practices is using Xavier initialization. In this approach you can initialize your weights with:

  1. Normal distribution. Where mean is 0 and var = sqrt(2. / (in + out)), where in - is the number of inputs to the neurons and out - number of outputs.

  2. Uniform distribution in range [-sqrt(6. / (in + out)), +sqrt(6. / (in + out))]

  • np.random.rand is for Uniform distribution (in the half-open interval [0.0, 1.0))
  • np.random.randn is for Standard Normal (aka. Gaussian) distribution (mean 0 and variance 1)

You can visually explore the differences between these two very easily:

import numpy as np
import matplotlib.pyplot as plt


sample_size = 100000
uniform = np.random.rand(sample_size)
normal = np.random.randn(sample_size)


pdf, bins, patches = plt.hist(uniform, bins=20, range=(0, 1), density=True)
plt.title('rand: uniform')
plt.show()


pdf, bins, patches = plt.hist(normal, bins=20, range=(-4, 4), density=True)
plt.title('randn: normal')
plt.show()

Which produce:

enter image description here

and

enter image description here

1) numpy.random.rand from uniform (in range [0,1))

2) numpy.random.randn generates samples from the normal distribution