Section 27.3 The standard error of parameters
A more statistics-oriented approach to model selection is to examine the randomness of the parameters
\(\beta_k\) we found: we study the hypothesis that the “true value” of a particular parameter
\(\beta_k\) is zero, meaning that the value we found for it is nonzero only due to chance. This approach does not really say whether a model is good for describing the data, but whether a particular parameter actually helps in it.
With this approach, we think of \(\mathbf y - \mathbf z\) as a vector of \(n\) observations of a random variables with \(n-p\) degrees of freedom (where \(p\) is the number of parameters). The variance of such a random variable is estimated as
\begin{equation}
s^2 = \frac{1}{n-p} \sum_{k=1}^n (y_k-z_k)^2\tag{27.3.1}
\end{equation}
You may have seen this formula with \(p=1\) which corresponds to the constant model \(z_k\equiv \bar y\text{.}\)
Solving for optimal parameters (theoretically) involves inverting the matrix
\(X^TX\text{,}\) seen in
(27.1.2). The matrix
\(s^2 (X^TX)^{-1}\) can be considered the
covariance matrix of estimated parameters. Specifically, its diagonal element in position
\((k, k)\) is the variance of
\(\beta_k\text{.}\) The square root of variance gives the
standard error of
\(\beta_k\text{.}\) As a rule of thumb, if
\(|\beta_k| \lt 2\operatorname{SE}(\beta_k)\) (a parameter is within two standard errors of 0), this parameter should be considered inessential and possibly removed from the model. The following computation implements this statistical analysis.
s2 = norm(y-z)^2/(n-p); % variance of error
C = s2*inv(X'*X); % covariance matrix for parameters
ts = beta./sqrt(diag(C)); % the t-statistics of the parameters
The parameters with
\(|t| \lt 2\) should be considered for removal. In the example of fitting polynomials of degree
\(d\text{,}\) we have
\(p = d+1\) parameters. The consideration should be focused on the leading coefficient, because if it appears to be inessential, we should remove it by decreasing the degree
\(d\) by
\(1\text{.}\) Note that the removal of one parameter causes all others to be recalculated, so the rest may become essential now.
Example 27.3.1. Computing the t-statistics of parameters.
Modify
Example 27.2.1 to include the computation of t-statistics for each coefficient of the polynomial.
Solution.
Note that the vector of t-statistics, which is called
ts below, contains the statistics for the coefficients of
\(x^0, \dots, x^d\) in this order. This means that
\(ts(end)\) is really the value of most interest here.
x = (0:14)';
y = [9 8 5 5 3 1 2 5 4 7 8 8 7 7 8]';
d = 2;
X = x.^(0:d);
beta = X\y;
f = @(x) (x.^(0:d))*beta;
[n, p] = size(X);
s2 = norm(y-f(x))^2/(n-p);
C = s2*inv(X'*X);
ts = beta./sqrt(diag(C));
disp(ts)