Skip to main content

Section 32.2 Newton's method for multivariate minimization

Recall from Chapter 10 that the Newton method for solving a vector equation \(\mathbf F(\mathbf x) = 0\) proceeds in iterative steps of the form \(\mathbf x = \mathbf a - J(\mathbf a)^{-1} \mathbf F(\mathbf a)\) where \(J\) is the Jacobian matrix of \(\mathbf F\text{.}\) For the purpose of minimizing a scalar function \(f\text{,}\) we let \(\mathbf F = \nabla f\text{.}\) The Jacobian matrix \(J\) of \(\mathbf F\) then becomes the Hessian matrix of \(f\text{,}\) which is the matrix of all second-order partial derivatives.

\begin{equation*} H = \begin{pmatrix} \partial^2 f/\partial x_1^2 & \partial^2 f/\partial x_1\partial x_2 \\ \partial^2 f/\partial x_1\partial x_2 & \partial^2 f/\partial x_2^2 \end{pmatrix} \end{equation*}

Thus, the Newton method for minimization proceeds in steps \(\mathbf x = \mathbf a - H^{-1} \nabla f(\mathbf a)\text{.}\) One can view the application of \(H^{-1}\) as the “course correction”: with rare exceptions, the method does not choose the direction of steepest descent. For the badly scaled quadratic function (32.1.1) the Newton direction is right on target: since

\begin{equation*} H = \begin{pmatrix} 2 & 0 \\ 0 & 2\cdot 10^{6} \end{pmatrix} \end{equation*}

the step taken from any point \(\mathbf a\) is

\begin{equation*} -H^{-1} \nabla f(\mathbf a) = - \begin{pmatrix} 1/2 & 0 \\ 0 & (1/2)\cdot 10^{-6} \end{pmatrix} \begin{pmatrix} 2a_1 \\ 2\cdot 10^6 a_2 \end{pmatrix} = -\mathbf a \end{equation*}

so we arrive at the minimum \(\mathbf x = 0\) after one step. This works for any quadratic function.

Since a smooth function locally looks like a quadratic polynomial (2nd order Taylor polynomial), Newton's method has a chance of improving the situation even for challenging functions like Rosenbrock's. Let us try it out.

Minimize Rosenbrock's function (32.1.2) using Newton's method with a random starting point randn(2, 1). Plot the path of the search.

Solution

The gradient of Rosenbrock's function is (32.1.3) and its Hessian matrix is

\begin{equation} \begin{pmatrix} 2 + 400 (x_1^2-x_2) + 800 x_1^2 & -400x_1 \\ -400 x_1 & 200 \end{pmatrix}\label{eq-rosenbrock-hess}\tag{32.2.1} \end{equation}
f = @(x) (x(1)-1).^2 + 100*(x(1).^2 - x(2)).^2;
fg = @(x) [2*(x(1)-1) + 400*(x(1).^2 - x(2))*x(1); -200*(x(1).^2 - x(2))];
fh = @(x) [2 + 400*(x(1)^2-x(2)) + 800*x(1)^2,  -400*x(1); -400*x(1), 200];
x = randn(2, 1);

max_tries = 10000;

path = zeros(2, max_tries);
for k=1:max_tries
    path(:, k) = x;
    dx = -fh(x)\fg(x);
    x = x + dx;
    if norm(dx) < 1e-6
        break
    end
end

plot(path(1, 1:k), path(2, 1:k), '-+')
if k < max_tries
    fprintf('Found x = (%g, %g) with f(x) = %g after %d steps\n', x, f(x), k);
else
    disp('Failed to converge')
end

The process usually makes a few seemingly random jumps but then quickly converges.

The downsides of Newton's method were already noted in Section 30.3: it requires second-order derivatives, and it could just as well converge to a local maximum.