Section 30.4 Golden section search
This is a bracket-based method for minimization which is similar to bisection method. However, if we applied bisection method to \(f'\) we might find a local maximum. The golden section search will not do that; it is designed to search for a minimum.
The bisection method was based on the idea that if \(f(a)f(b) \lt 0\) (and \(f\) is continuous), then there is a root of \(f\) on the interval \([a, b]\text{.}\) What do we need to be sure there is at least a local minimum on \([a, b]\text{?}\) Some point \(c\in [a, b]\) such that \(f(c) \lt f(a)\) and \(f(c) \lt f(b)\text{.}\) However, if we divide the interval into \([a, c]\) and \([c, b]\text{,}\) it is not clear what to do next: which part should we keep?
Instead, the bracket should include two interior points, say \(a \lt c \lt d \lt b\text{.}\) Then we can compare \(f(c)\) and \(f(d)\text{;}\) the smaller one will point to us which interval to keep, either \([a, d]\) (if \(f(c)\) is smaller) or \([c, b]\) (if \(f(d)\) is smaller).
The choice of two points \(c,d\) is not obvious. We could place them \(1/3\) of the way from each end, namely \(c=a+(b-a)/3\) and \(d=b-(b-a)/3\text{.}\) Then if, for example, \(f(c)\lt f(d)\text{,}\) we know there is a point of minimum in \([a, d]\text{.}\) To keep the process going, need to pick two interior evaluation points on \([a,d]\text{.}\) Frustratingly, the point \(c\) which is in \([a, d]\text{,}\) does not fit our algorithm because it is in the middle of \([a,d]\text{,}\) not \(1/3\) from an edge.
Golden section method improves on the above by choosing \(c,d\) as follows:
The number \(r\) is related to golden ratio \(\varphi = (\sqrt{5}+1 )/2\approx 1.618\) by \(r=1/\varphi^2\text{.}\) The following intervals are all in the golden ratio to each other:
The result of this invariance of the ratio is that when the bracket is reduced, the remaining interior point again divides it in the golden ratio.
With each step of iteration, only one additional evaluation of \(f\) is needed. The bracket size is divided by \(\varphi\) at every step. This is linear rate of convergence, which is slow but reliable (if the function is unimodal), and we don't need the derivative of \(f\) to use this method.
What if the function is not unimodal? If we are lucky, the process may converge to the global minimum. But it's quite possible that the global minimum will be lost from the bracket at some step and then we'll converge to a local minimum instead.
Example 30.4.2. Minimize by golden section.
Minimize \(f(x) = x^4 - 3 x^2 + x + 1\) using the golden section method on the interval \([-2, 2]\text{.}\)
f = @(x) x.^4 - 3*x.^2 + x + 1; a = -2; b = 2; r = (3-sqrt(5))/2; while b-a >= 100*eps(a) c = a + r*(b-a); d = b - r*(b-a); if f(c) < f(d) b = d; else a = c; end end fprintf('Found a minimum x = %g with f(x) = %g \n', c, f(c));
The code in Example 30.4.2 does not store previously computed values of the function and therefore they get recomputed again for the same \(x\)-value. In some languages, such caching of previously computed function values may happen behind the scenes. In Matlab it does not (currently), so we need to store the values to be reused. The following version of the above code does this.
Example 30.4.3. Efficient golden section search.
Improve the efficiency of the code in Example 30.4.2 by storing and reusing some function values.
We use the variables fc
and fd
to store the values of \(f\) at the points c
and d
. When the algorithm replaces of these points with the other, the previously computed value is reused without executing the function \(f\) again. This approach requires more work to be done before the main loop, in order to initialize fc
and fd
.
f = @(x) x.^4 - 3*x.^2 + x + 1; a = -2; b = 2; r = (3-sqrt(5))/2; c = a + r*(b-a); % preparation fc = f(c); d = b - r*(b-a); fd = f(d); while b-a >= 100*eps(a) if fc < fd b = d; d = c; fd = fc; % reused value c = a + r*(b-a); fc = f(c); % newly computed value else a = c; c = d; fc = fd; % reused value d = b - r*(b-a); fd = f(d); % newly computed value end end
The code is longer than Example 30.4.2 but it runs faster because at every execution of the loop there is only one call to function \(f\) instead of two.
Since the function \(f\) used in Example 30.4.2 and Example 30.4.3 is easy to compute, the gain of optimized implementation may not be evident. To make it visible, replace the function with the following, which takes relatively long to evaluate:
f = @(x) x.^4 - 3*x.^2 + x + 1 + 0*sum(rand(1, 1e6));
and time the script with tic
toc
as in Section 4.5.