Skip to main content

Section 33.1 First attempt at derivative-free minimization

The intuitive picture of steepest descent (the gradient method) is a drop of water sliding down the graph of a function under the force of gravity: the water follows the steepest way down. The intuition of the Nelder-Mead method is a geometric shape like a triangle or a tetrahedron tumbling down a sloped surface. In two dimensions, the shape is a triangle, in three dimensions it is a tetrahedron; in \(n\) dimensions it is an \(n\)-dimensional simplex, meaning \(n+1\) points that are not contained in any hyperplane. To keep the explanation simple, we focus on the case \(n=2\text{.}\)

Given a triangle \(ABC\) in \(xy\)-plane, we evaluate the function \(f\) at its vertices. The vertex with the largest value of \(f\) (for example, \(C\)) might get replaced by its reflection. One could imagine reflecting \(C\) about the opposite side \(AB\) by dropping a perpendicular line from \(C\) onto \(AB\) and following it. However, this kind of reflection is not invariant under linear transformations of \(xy\) plane, such as the scaling of coordinates. The Nelder-Mead method uses reflection about the midpoint of \(AB\text{.}\) The midpoint can be computed as \(M = (A+B)/2\text{,}\) and then the reflected point is

\begin{equation} R = 2M - C \label{eq-nm-reflection}\tag{33.1.1} \end{equation}

However, it would be pointless to replace \(C\) by \(R\) if this did not make the function smaller. So the replacement only happens if \(f(R)\lt f(C)\text{.}\) Otherwise the program stops and reports the center \((A+B+C)/3\) of the last computed triangle as an approximate point of minimum. This is a “reflection-only” version of the Nelder-Mead method.

Minimize the function

\begin{equation} f(\mathbf x) = x_1^4 + x_2^2 - \sin(x_1+x_2)\label{eq-objective-function-sine}\tag{33.1.2} \end{equation}

using the reflection-only version of the Nelder-Mead method. Use a random starting point randn(2, 1) as one vertex \(A\) of the initial triangle, and let other vertices be \(A+h\mathbf e_1\text{,}\) \(A+h\mathbf e_2\) with small \(h = 0.01\) (this makes a small right isosceles triangle). Plot the path of the search.

Solution

It is convenient to arrange column vectors \(A, B, C\) into a matrix \(T\) representing a triangle. This makes it easy to replace the vertex with greatest value of \(f\) after identifying it with two-output max command. The command mean(T, 2) computes the mean \((A+B+C)/3\) while sum(T, 2) is \(A+B+C\text{.}\)

f = @(x) x(1)^4 + x(2)^2 - sin(x(1)+x(2));

A = randn(2, 1);
B = A + [0.01; 0];
C = A + [0; 0.01];
T = [A B C];
max_tries = 10000;
path = zeros(2, max_tries);

for k = 1:max_tries
    path(:, k) = mean(T, 2);
    values = [f(T(:,1)) f(T(:,2)) f(T(:,3))];
    [fmax, ind] = max(values);
    M = (sum(T, 2) - T(:, ind))/2;  % midpoint of opposite side
    R = 2*M - T(:, ind);
    if f(R) < fmax
        T(:, ind) = R;
    else 
        break
    end
end

plot(path(1, 1:k), path(2, 1:k), '-+')
if k < max_tries
    x = mean(T, 2);
    fprintf('Found x = (%g, %g) with f(x) = %g after %d steps\n', x, f(x), k);
else
    disp('Failed to converge')
end