LSTM 辍学与 LSTM 复发性辍学的差异

根据克拉斯的记录:

在0和1之间浮动 输入的线性映射。

复发 _ 辍学: 在0和1之间浮动。单位的分数到 经常性状态的线性映射下降。

有没有人能指出下面图片中的哪个地方发生了每次辍学?

enter image description here

39876 次浏览

I suggest taking a look at (the first part of) this paper. Regular dropout is applied on the inputs and/or the outputs, meaning the vertical arrows from x_t and to h_t. In your case, if you add it as an argument to your layer, it will mask the inputs; you can add a Dropout layer after your recurrent layer to mask the outputs as well. Recurrent dropout masks (or "drops") the connections between the recurrent units; that would be the horizontal arrows in your picture.

This picture is taken from the paper above. On the left, regular dropout on inputs and outputs. On the right, regular dropout PLUS recurrent dropout:

This picture is taken from the paper above. On the left, regular dropout on inputs and outputs. On the right, regular dropout PLUS recurrent dropout.

(Ignore the colour of the arrows in this case; in the paper they are making a further point of keeping the same dropout masks at each timestep)

Above answer highlights one of the recurrent dropout methods but that one is NOT used by tensorflow and keras. Tensorflow Doc.

Keras/TF refers a recurrent method proposed by Semeniuta et al. Also, check below the image comparing different recurrent dropout methods. The Gal and Ghahramani method which is mentioned in above answer is at second position and Semeniuta method is the right most.

enter image description here