Section 27.3 The standard error of parameters
A more statistics-oriented approach to model selection is to examine the randomness of the parameters \(\beta_k\) we found: we study the hypothesis that the “true value” of a particular parameter \(\beta_k\) is zero, meaning that the value we found for it is nonzero only due to chance. This approach does not really say whether a model is good for describing the data, but whether a particular parameter actually helps in it.
With this approach, we think of \(\mathbf y - \mathbf z\) as a vector of \(n\) observations of a random variables with \(n-p\) degrees of freedom (where \(p\) is the number of parameters). The variance of such a random variable is estimated as
You may have seen this formula with \(p=1\) which corresponds to the constant model \(z_k\equiv \bar y\text{.}\)
Solving for optimal parameters (theoretically) involves inverting the matrix \(X^TX\text{,}\) seen in (27.1.2). The matrix \(s^2 (X^TX)^{-1}\) can be considered the covariance matrix of estimated parameters. Specifically, its diagonal element in position \((k, k)\) is the variance of \(\beta_k\text{.}\) The square root of variance gives the standard error of \(\beta_k\text{.}\) As a rule of thumb, if \(|\beta_k| \lt 2\operatorname{SE}(\beta_k)\) (a parameter is within two standard errors of 0), this parameter should be considered inessential and possibly removed from the model. The following computation implements this statistical analysis.
s2 = norm(y-z)^2/(n-p); % variance of error C = s2*inv(X'*X); % covariance matrix for parameters ts = beta./sqrt(diag(C)); % the t-statistics of the parameters
The parameters with \(|t| \lt 2\) should be considered for removal. In the example of fitting polynomials of degree \(d\text{,}\) we have \(p = d+1\) parameters. The consideration should be focused on the leading coefficient, because if it appears to be inessential, we should remove it by decreasing the degree \(d\) by \(1\text{.}\) Note that the removal of one parameter causes all others to be recalculated, so the rest may become essential now.
Example 27.3.1. Computing the t-statistics of parameters.
Modify Example 27.2.1 to include the computation of t-statistics for each coefficient of the polynomial.
Note that the vector of t-statistics, which is called ts
below, contains the statistics for the coefficients of \(x^0, \dots, x^d\) in this order. This means that \(ts(end)\) is really the value of most interest here.
x = (0:14)'; y = [9 8 5 5 3 1 2 5 4 7 8 8 7 7 8]'; d = 2; X = x.^(0:d); beta = X\y; f = @(x) (x.^(0:d))*beta; [n, p] = size(X); s2 = norm(y-f(x))^2/(n-p); C = s2*inv(X'*X); ts = beta./sqrt(diag(C)); disp(ts)