We have used gradient descent where in order to minimize the cost function J(theta), we would take this iterative algorithm that takes many steps, multiple iterations of gradient descent to converge to the global minimunm.

 

In contrast, the normal equation would gibe us a method to solve for theta analytically, so that rather than needing to run this iterative algorithm, we cna instead just solve for the optimal value for theta all at one go, so that in basically one step you get to the optimal value there.

[Machine Learning] Normal Equation for linear regression_[Machine Learning]

 

There is no need to do feature scaling with the normal equation.

The following is a comparison of gradient descent and the normal equation:

[Machine Learning] Normal Equation for linear regression_sed_02

[Machine Learning] Normal Equation for linear regression_sed_03

 

[Machine Learning] Normal Equation for linear regression_[Machine Learning]_04