Section 6.2 Systems without free variables
What do we get when solving a system with A\b
in Matlab? Let us first consider the case when rank(A)
is n
, so every column of A is a basic column, which means there are no free variables. When such a system is consistent, it has a unique solution and A\b
finds this unique solution. When the system is inconsistent, we get the least-squares solution: the vector \(x\) for which the norm of the residual vector \(Ax-b\) is as small as possible. Recall that the norm of a vector \(v = (v_1, \dots, v_m)\) is
In Matlab this norm can be computed in several ways:
sqrt(sum(v.^2))
-
sqrt(v'*v)
whenv
is a column vector -
sqrt(v*v')
whenv
is a row vector sqrt(dot(v, v))
norm(v)
The last way is easiest.
Example 6.2.1. The least-squares solution and the norm of residual vector.
Find the least-squares solution \(x\) and the norm of residual vector \(Ax-b\) for the system with the matrix
and with
A = [1 2; 4 3; 5 6]; b = [7 8 9]'; x = A\b; disp(x) disp(norm(A*x-b))
It may be informative to add the line disp(A*x)
and compare this vector to b
.
A practical use of least squares solutions is linear regression. Suppose we are given some data points \((x_k, y_k)\) and want to find the line \(y = cx+d\) that fits these points the best. The idea is to imagine we could have perfect fit:
This system can be written as \(Az=b\) where
This is an overdetermined system when we have more than 2 data points. Unless all \(x\)-values are the same (which should not happen), the rank of A is 2 and so the system has no free variables. This means that A\b
will find its least squares solution: a vector with components \((c, d)\) for which the sum of squares of the residuals \(cx_k+d - y_k\) is as small as possible. This is the line of best least-squares fit, and this process is known as linear regression in statistics.
Example 6.2.2. Line of best fit.
Find the line of best fit (in the sense of least squares) to the six points with x-values (-1, 0, 2, 3, 4, 7) and y-values (9, 8, 5, 5, 3, -2). Plot both the line and the points.
x = [-1 0 2 3 4 7]'; y = [9 8 5 5 3 1]'; A = [x x.^0]; z = A\y; xx = linspace(min(x), max(x), 500)'; yy = [xx xx.^0]*z; plot(xx, yy, 'b', x, y, 'r*')
Explanation: think of \(cx+d\) as \(cx + dx^0\) because this naturally leads to the matrix of this linear system: its columns are x
and x.^0
. The plot of given data points created by plot(x, y, 'r*')
consists of red asterisks, not joined by a line. But to plot the line of best fit (or a curve of best fit, in general) we use more points on the x-axis. These points are in the vector xx
: they are evenly distributed over the same interval where the given data points lie. The way in which y-coordinates of this line are computed may look strange. The matrix product [xx xx.^0]*z
is logically the same as z(1)*xx + z(2)
: multiply x-values by the coefficient \(c\) and add the constant term \(d\text{.}\) Writing it in matrix form allows for easier generalization later. Note that the line yy = [xx xx.^0]*z
, which uses the solution of a linear system, is consistent with A = [x x.^0]
which created that system.