Section 10.2 Multivariate Newton's method
Newton's method (Section 9.1) is based on the idea of replacing a nonlinear function with its linear approximation, and solving the resulting linear equation. The linear approximation comes from the derivative. In the setting of several variables we have partial derivatives which are arranged into the Jacobian matrix. The \((i, j)\) entry of the Jacobian matrix is the derivative of the \(i\)th component of \(\mathbf F\) with respect to the \(j\)th component of \(\mathbf x\text{:}\)
(or more rows/columns if there are more variables).
Example 10.2.1. Find the Jacobian matrix.
Find the Jacobian matrix of the function (10.1.2). Express it as an anonymous function in Matlab.
The Jacobian matrix is
As a Matlab function, it can be written as follows.
J = @(x) [2*x(1)*exp(3*x(2)), 3*x(1)^2*exp(3*x(2)); x(2) - cos(x(1) + x(2)^2), x(1) - 2*x(2)*cos(x(1) + x(2)^2)];
The linebreak between matrix rows is optional but improves readability.
The linear approximation to \(\mathbf F\) at a point \(\mathbf x\) is \(\mathbf F(\mathbf x + \mathbf h) \approx \mathbf F(\mathbf x) + J\mathbf h\) where \(J\mathbf h\) is a matrix-vector product.
Example 10.2.2. Find the linear approximation.
Find the linear approximation to function (10.1.2) using x = [2; -1]
and h = [0.3, 0.2]
. Compare the approximation to the actual value of F(x+h)
.
F = @(x) [x(1)^2*exp(3*x(2)) - 30; x(1)*x(2) - sin(x(1) + x(2)^2)]; J = @(x) [2*x(1)*exp(3*x(2)), 3*x(1)^2*exp(3*x(2)); x(2) - cos(x(1) + x(2)^2), x(1) - 2*x(2)*cos(x(1) + x(2)^2)]; x = [2; -1]; h = [0.3; 0.2]; fprintf("Linear approximation: (%g, %g)\n", F(x) + J(x)*h) fprintf("Actual value: (%g, %g)\n", F(x + h))
The output:
Linear approximation: (-29.6216, -2.14012) Actual value: (-29.5201, -2.04023)
These are close, so the approximation is good. Note the use of formatted strings with vectors: Matlab automatically “unpacks” vectors when they are used in fprintf
. This means one should have a formatting code for every entry of the vector, like (%g, %g)
above. The formatting code %g
without any specification of precision means the numbers are shown as Matlab would normally shows them.
As with the single-variable Newton method, we equate the linear approximation to zero and solve the linear equation. The solution of \(\mathbf F(\mathbf x) + J\mathbf h=0\) is \(\mathbf h = -J^{-1} \mathbf F(\mathbf x)\text{.}\) This formula is only for writing down the theoretical approach, in practice we do not invert the matrix \(J\) and simply ask Matlab to solve the linear system (this is more efficient than computing the inverse matrix). So, in Matlab terms the formula is h = -J\F(x)
. Having found the vector \(\mathbf h\) we replace \(\mathbf x\) with \(\mathbf x + \mathbf h\) and repeat. The process stops when the steps becomes sufficiently small, indicating we approached a solution. The size of vectors is measured by their norm: norm(h)
.
The typical behavior of Newton's method is that it jumps around for several steps, but once it gets in a neighborhood of a solution, it converges to it very quickly.
Example 10.2.3. Solve a system using Newton's method.
Solve the system (10.1.1) using Newton's method. Try different initial points. Does the method always converge? Does it converge to the same solution?
Following the structure of single-variable Newton method with notational adjustments:
F = @(x) [x(1)^2*exp(3*x(2)) - 30; x(1)*x(2) - sin(x(1) + x(2)^2)]; J = @(x) [2*x(1)*exp(3*x(2)), 3*x(1)^2*exp(3*x(2)); x(2) - cos(x(1) + x(2)^2), x(1) - 2*x(2)*cos(x(1) + x(2)^2)]; x = [2; 1]; max_tries = 10000; for k = 1:max_tries h = -J(x)\F(x); x = x + h; if norm(h) < 100*eps(norm(x)) break end end if k < max_tries fprintf('Found a solution after %d steps:\n', k); disp(x) fprintf('The norm of F(x) is %g\n', norm(F(x))) else disp('Failed to converge') end
It is reasonable to increase the “max tries” number in Newton's method when several variables are involved, as convergence may take longer. In the above example, with initial point [2; 1]
, a solution [-0.0019; 5.3185]
is found in 15 steps. With initial point [1; 2]
the method fails to converge. With initial point [2; 2]
it converges in 10 steps to a different solution, [-0.2734; 1.9982]
.