Section 27.2 Overfitting, training and testing
Recall Example 6.5.1 in which we find a best-fitting parabola to given data. The code is extended below to allow arbitrary degree d of the polynomial, and to add the computation of R^2 according to (27.1.5). Note the use of column vectors below.Example 27.2.1. Fitting a polynomial of any degree.
Improve Example 6.5.1 to work with polynomials of any degree \(d\text{,}\) and add a computation of \(R^2\) to it.
Solution
x = (0:14)'; y = [9 8 5 5 3 1 2 5 4 7 8 8 7 7 8]'; % data d = 2; % degree of polynomial X = x.^(0:d); % matrix of linear system for parameters beta = X\y; % optimal parameters f = @(x) (x.^(0:d))*beta; % best-fitting function f t = linspace(min(x), max(x), 1000)'; plot(t, f(t), 'b', x, y, 'r*') total = norm(y - mean(y))^2; residual = norm(y - f(x))^2; fprintf('R^2 for degree %d is %g\n', d, 1 - residual/total);
x_train = x(1:8); x_test = x(9:end);(and similarly for
y
) or
x_train = x(1:2:end); x_test = x(2:2:end);
Example 27.2.2. Training-testing split.
Use the training-testing split in Example 27.2.1.
Solution
x = (0:14)'; y = [9 8 5 5 3 1 2 5 4 7 8 8 7 7 8]'; x_train = x(1:2:end); y_train = y(1:2:end); x_test = x(2:2:end); y_test = y(2:2:end); d = 2; X = x_train.^(0:d); beta = X\y_train; f = @(x) (x.^(0:d))*beta; t = linspace(min(x), max(x), 1000)'; plot(t, f(t), 'b', x, y, 'r*') total = norm(y_test - mean(y_test))^2; residual = norm(y_test - f(x_test))^2; fprintf('R^2 for degree %d is %g\n', d, 1 - residual/total);