Section 32.2 Newton's method for multivariate minimization
Recall from Chapter 10 that the Newton method for solving a vector equation \(\mathbf F(\mathbf x) = 0\) proceeds in iterative steps of the form \(\mathbf x = \mathbf a - J(\mathbf a)^{-1} \mathbf F(\mathbf a)\) where \(J\) is the Jacobian matrix of \(\mathbf F\text{.}\) For the purpose of minimizing a scalar function \(f\text{,}\) we let \(\mathbf F = \nabla f\text{.}\) The Jacobian matrix \(J\) of \(\mathbf F\) then becomes the Hessian matrix of \(f\text{,}\) which is the matrix of all second-order partial derivatives.
Thus, the Newton method for minimization proceeds in steps \(\mathbf x = \mathbf a - H^{-1} \nabla f(\mathbf a)\text{.}\) One can view the application of \(H^{-1}\) as the “course correction”: with rare exceptions, the method does not choose the direction of steepest descent. For the badly scaled quadratic function (32.1.1) the Newton direction is right on target: since
the step taken from any point \(\mathbf a\) is
so we arrive at the minimum \(\mathbf x = 0\) after one step. This works for any quadratic function.
Since a smooth function locally looks like a quadratic polynomial (2nd order Taylor polynomial), Newton's method has a chance of improving the situation even for challenging functions like Rosenbrock's. Let us try it out.
Example 32.2.1. Multivariate minimization with Newton's method.
Minimize Rosenbrock's function (32.1.2) using Newton's method with a random starting point randn(2, 1)
. Plot the path of the search.
The gradient of Rosenbrock's function is (32.1.3) and its Hessian matrix is
f = @(x) (x(1)-1).^2 + 100*(x(1).^2 - x(2)).^2; fg = @(x) [2*(x(1)-1) + 400*(x(1).^2 - x(2))*x(1); -200*(x(1).^2 - x(2))]; fh = @(x) [2 + 400*(x(1)^2-x(2)) + 800*x(1)^2, -400*x(1); -400*x(1), 200]; x = randn(2, 1); max_tries = 10000; path = zeros(2, max_tries); for k=1:max_tries path(:, k) = x; dx = -fh(x)\fg(x); x = x + dx; if norm(dx) < 1e-6 break end end plot(path(1, 1:k), path(2, 1:k), '-+') if k < max_tries fprintf('Found x = (%g, %g) with f(x) = %g after %d steps\n', x, f(x), k); else disp('Failed to converge') end
The process usually makes a few seemingly random jumps but then quickly converges.
The downsides of Newton's method were already noted in Section 30.3: it requires second-order derivatives, and it could just as well converge to a local maximum.