文章目录
Review: Gradient Descent
Randomly start at
Tuning your learning rates
Adagrad
Adagrad
推导过程:Adagrad解释:
Stochastic Gradient Descent(随机梯度下降)
Feature Scaling(特征缩放)
特征缩放作用:
面对特征数量较多的时候,保证这些特征具有相近的尺度(无量纲化),可以使梯度下降法更快的收敛。这两张图代表数据是否均一化的最优解寻解过程(左边是未归一化的),
从这两张图可以看出,数据归一化后,最优解的寻优过程明显会变得平缓,更容易正确的收敛到最优解
怎样进行特征缩放
对红色框里面的进行特征缩放,就要先求出绿框里面元素的平均值,再求出绿框里面元素的标准差,最后代入
Gradient Descent Theory
求解
梯度下降的理论推导过程:
数学基础:
Taylor Series
- Taylor series: Letbe any function infinitely differentiable around
Whenis close to
Multivariable Taylor Series
Whenandis close toand推导过程: Based on Taylor Series: If the red circle is small enough, in the red circle constant Find and in the red circle Find and in th minimizing minimizing