Skip to main content

Section 32.3 Conjugate gradient method

The methods considered so far in this chapter have one thing in common: lack of memory. Each step is taken as if it was the first. In contrast, conjugate gradient method uses its previous step to determine the direction of next one. The primary benefit is avoiding “sharp turns” which create a zigzagging pattern we saw in Example 32.1.1. The method described below is sometimes called nonlinear conjugate gradient method because “conjugate gradient method” often refers specifically to minimization of a quadratic function \(f(\mathbf x) = |A\mathbf x - \mathbf b|^2\) as a way of approximately solving the linear system \(A\mathbf x = \mathbf b\text{.}\)

Suppose we arrived to point \(\mathbf a\) from point \(\mathbf b\text{,}\) following the direction vector \(\mathbf v\text{.}\) With the conjugate gradient method, the direction of our next move will be\(\mathbf w = -\nabla f(a) + \gamma \mathbf v\) where \(\gamma > 0\text{.}\) Large \(\gamma\) means we keep mostly the same direction as before, small \(\gamma\) means we go with the gradient. We should make \(\gamma\) smaller, giving more importance to the gradient, if \(\mathbf a\) is much closer to the minimum than \(\mathbf b\text{.}\) This progress can be measured by the ratio \(|\nabla f(\mathbf a)|^2/|\nabla f(\mathbf b)|^2\) (small if we arrived in the vicinity of minimum), so we use this quantity as \(\gamma\text{.}\)

Modify Example 32.1.1 so that it minimizes the simplified Rosenbrock's function

\begin{equation} f(\mathbf x) = (x_1 - 1)^2 + (x_1^2 - x_2)^2\label{eq-rosenbrock-simplified}\tag{32.3.1} \end{equation}

using conjugate gradient method. Use a random starting point randn(2, 1). Plot the path of the search.

Solution
f = @(x) (x(1)-1).^2 + (x(1).^2 - x(2)).^2;
fg = @(x) [2*(x(1)-1) + 4*x(1)*(x(1).^2 - x(2)); -2*(x(1).^2 - x(2))];

x = randn(2, 1);
v = zeros(2, 1);
gamma = 0;
max_tries = 10000;

path = zeros(2, max_tries);
for k = 1:max_tries
    path(:, k) = x;
    w = -fg(x) + gamma*v;  % with correction
    t_min = fminbnd(@(t) f(x + t*w), 0, 1);
    dx = t_min*w;
    if norm(dx) < 1e-6
        break
    end

    gamma = norm(fg(x+dx))^2/norm(fg(x))^2;  % update gamma
    x = x + dx;                              % update x
    v = w;                 % record the step for the future
end

plot(path(1, 1:k), path(2, 1:k), '-+')
if k < max_tries
    fprintf('Found x = (%g, %g) with f(x) = %g after %d steps\n', x, f(x), k);
else
    disp('Failed to converge')
end

Compare the search path to what we get without correction in w: it is less zigzagging. However, if we use the original Rosenbrock function (32.1.2) which is very far from quadratic or convex, the search diverges to infinity. Several other correction terms (different choices for \(\gamma\) coefficient) have been proposed for the nonlinear conjugate gradient method. But it appears that the “narrow valley” landscape of the Rosenbrock function is best navigated either by using both first and second order derivatives (the Newton method) or by using no derivatives at all (the Nelder-Mead method, to be considered later).