由于自适应的特性,默认速率是相当稳健的,但有时可能需要对其进行优化。你所能做的就是事先找到一个最佳的违约率,从一个非常小的比率开始,然后增加它直到损失停止下降,然后看损失曲线的斜率,选择与损失下降最快相关的学习率(而不是损失实际上最低的点)。Jeremy Howard 在 fast. ai 深度学习课程中提到了这一点,这一点来自于周期性学习率的论文。
Edit: People have fairly recently started using one-cycle learning rate policies in conjunction with Adam with great results.
Be careful when using weight decay with the vanilla Adam optimizer, as it appears that the vanilla Adam formula is wrong when using weight decay, as pointed out in the article Decoupled Weight Decay Regularizationhttps://arxiv.org/abs/1711.05101 .