The standard error of parameters

Section 27.3 The standard error of parameters

A more statistics-oriented approach to model selection is to examine the randomness of the parameters \(\beta_k\) we found: we study the hypothesis that the “true value” of a particular parameter \(\beta_k\) is zero, meaning that the value we found for it is nonzero only due to chance. This approach does not really say whether a model is good for describing the data, but whether a particular parameter actually helps in it.

With this approach, we think of \(\mathbf y - \mathbf z\) as a vector of \(n\) observations of a random variables with \(n-p\) degrees of freedom (where \(p\) is the number of parameters). The variance of such a random variable is estimated as

\begin{equation} s^2 = \frac{1}{n-p} \sum_{k=1}^n (y_k-z_k)^2\label{eq-sample-variance}\tag{27.3.1} \end{equation}

You may have seen this formula with \(p=1\) which corresponds to the constant model \(z_k\equiv \bar y\text{.}\)

Solving for optimal parameters (theoretically) involves inverting the matrix \(X^TX\text{,}\) seen in (27.1.2). The matrix \(s^2 (X^TX)^{-1}\) can be considered the covariance matrix of estimated parameters. Specifically, its diagonal element in position \((k, k)\) is the variance of \(\beta_k\text{.}\) The square root of variance gives the standard error of \(\beta_k\text{.}\) As a rule of thumb, if \(|\beta_k| \lt 2\operatorname{SE}(\beta_k)\) (a parameter is within two standard errors of 0), this parameter should be considered inessential and possibly removed from the model. The following computation implements this statistical analysis.

s2 = norm(y-z)^2/(n-p);      % variance of error
C = s2*inv(X'*X);            % covariance matrix for parameters
ts = beta./sqrt(diag(C));    % the t-statistics of the parameters

The parameters with \(|t| \lt 2\) should be considered for removal. In the example of fitting polynomials of degree \(d\text{,}\) we have \(p = d+1\) parameters. The consideration should be focused on the leading coefficient, because if it appears to be inessential, we should remove it by decreasing the degree \(d\) by \(1\text{.}\) Note that the removal of one parameter causes all others to be recalculated, so the rest may become essential now.

Example 27.3.1. Computing the t-statistics of parameters.

Modify Example 27.2.1 to include the computation of t-statistics for each coefficient of the polynomial.

Solution

Note that the vector of t-statistics, which is called ts below, contains the statistics for the coefficients of \(x^0, \dots, x^d\) in this order. This means that \(ts(end)\) is really the value of most interest here.

x = (0:14)';  
y = [9 8 5 5 3 1 2 5 4 7 8 8 7 7 8]'; 
d = 2;            
X = x.^(0:d);     
beta = X\y;        
f = @(x) (x.^(0:d))*beta; 

[n, p] = size(X);
s2 = norm(y-f(x))^2/(n-p);
C = s2*inv(X'*X);         
ts = beta./sqrt(diag(C));
disp(ts)