Newton's method for multivariate minimization

Section 32.2 Newton's method for multivariate minimization

Recall from Chapter 10 that the Newton method for solving a vector equation \(\mathbf F(\mathbf x) = 0\) proceeds in iterative steps of the form \(\mathbf x = \mathbf a - J(\mathbf a)^{-1} \mathbf F(\mathbf a)\) where \(J\) is the Jacobian matrix of \(\mathbf F\text{.}\) For the purpose of minimizing a scalar function \(f\text{,}\) we let \(\mathbf F = \nabla f\text{.}\) The Jacobian matrix \(J\) of \(\mathbf F\) then becomes the Hessian matrix of \(f\text{,}\) which is the matrix of all second-order partial derivatives.

\begin{equation*} H = \begin{pmatrix} \partial^2 f/\partial x_1^2 & \partial^2 f/\partial x_1\partial x_2 \\ \partial^2 f/\partial x_1\partial x_2 & \partial^2 f/\partial x_2^2 \end{pmatrix} \end{equation*}

Thus, the Newton method for minimization proceeds in steps \(\mathbf x = \mathbf a - H^{-1} \nabla f(\mathbf a)\text{.}\) One can view the application of \(H^{-1}\) as the “course correction”: with rare exceptions, the method does not choose the direction of steepest descent. For the badly scaled quadratic function (32.1.1) the Newton direction is right on target: since

\begin{equation*} H = \begin{pmatrix} 2 & 0 \\ 0 & 2\cdot 10^{6} \end{pmatrix} \end{equation*}

the step taken from any point \(\mathbf a\) is

\begin{equation*} -H^{-1} \nabla f(\mathbf a) = - \begin{pmatrix} 1/2 & 0 \\ 0 & (1/2)\cdot 10^{-6} \end{pmatrix} \begin{pmatrix} 2a_1 \\ 2\cdot 10^6 a_2 \end{pmatrix} = -\mathbf a \end{equation*}

so we arrive at the minimum \(\mathbf x = 0\) after one step. This works for any quadratic function.

Since a smooth function locally looks like a quadratic polynomial (2nd order Taylor polynomial), Newton's method has a chance of improving the situation even for challenging functions like Rosenbrock's. Let us try it out.

Example 32.2.1. Multivariate minimization with Newton's method.

Minimize Rosenbrock's function (32.1.2) using Newton's method with a random starting point randn(2, 1). Plot the path of the search.

Solution

The gradient of Rosenbrock's function is (32.1.3) and its Hessian matrix is

\begin{equation} \begin{pmatrix} 2 + 400 (x_1^2-x_2) + 800 x_1^2 & -400x_1 \\ -400 x_1 & 200 \end{pmatrix}\label{eq-rosenbrock-hess}\tag{32.2.1} \end{equation}

f = @(x) (x(1)-1).^2 + 100*(x(1).^2 - x(2)).^2;
fg = @(x) [2*(x(1)-1) + 400*(x(1).^2 - x(2))*x(1); -200*(x(1).^2 - x(2))];
fh = @(x) [2 + 400*(x(1)^2-x(2)) + 800*x(1)^2,  -400*x(1); -400*x(1), 200];
x = randn(2, 1);

max_tries = 10000;

path = zeros(2, max_tries);
for k=1:max_tries
    path(:, k) = x;
    dx = -fh(x)\fg(x);
    x = x + dx;
    if norm(dx) < 1e-6
        break
    end
end

plot(path(1, 1:k), path(2, 1:k), '-+')
if k < max_tries
    fprintf('Found x = (%g, %g) with f(x) = %g after %d steps\n', x, f(x), k);
else
    disp('Failed to converge')
end

The process usually makes a few seemingly random jumps but then quickly converges.

The downsides of Newton's method were already noted in Section 30.3: it requires second-order derivatives, and it could just as well converge to a local maximum.