Section 27.4 Multiple regression
Least squares method, as outlined at the beginning of Section 27.1, includes multiple linear regression, in which we model a dependent variable by a linear combination of several independent variables. In this form, the matrix \(X\) consists not of various powers of one variable, but of several explanatory variables. In the following example the observations modeled are the scores on a final exam, while explanatory variables are Homework average, Quiz average, Exam 1 score, Exam 2, and Exam 3. After fitting a linear combination of these five variable to the final exam score, the code calculates \(R^2\) and \(t\)-statistics.
hw = [97 99 90 76 96 80 100 55 69 85 89 100 73 89 95 98 98 73 100 84]'; quiz = [94 104 101 76 102 83 90 81 101 101 105 99 91 98 95 101 88 77 97 95]'; exam1 = [96 100 93 79 95 62 83 88 86 96 95 96 74 91 88 94 96 98 97 94]'; exam2 = [71 80 59 31 70 26 63 65 69 78 79 65 38 75 71 75 68 64 73 68]'; exam3 = [83 86 65 42 82 55 69 38 70 77 65 80 50 79 70 88 75 75 73 56]'; y = [82 92 74 57 77 56 51 42 79 92 89 79 49 84 88 84 71 88 83 89]'; % final exam X = [hw quiz exam1 exam2 exam3]; beta = X\y; z = X*beta; % predicted final scores total = norm(y - mean(y))^2; residual = norm(y - z)^2; disp('Parameters:') disp(beta'); fprintf('R^2 is %g\n', 1 - residual/total); [n, p] = size(X); % X is a matrix of size n by p s2 = norm(y-z)^2/(n-p); C = s2*inv(X'*X); ts = beta./sqrt(diag(C)); disp('t-statistic values'); disp(ts')
How to intepret these results? Can the model be improved by discarding some explanatory variables which fail to explain anything?