Skip to main content

Section 28.3 Weighted least squares

When we solve an overdetermined linear system \(Az=b\) as z = A\b, the result is the least squares solution which minimizes the sum

\begin{equation} \sum_k ((Az)_k-b_k)^2\label{eq-penalty-lsq}\tag{28.3.1} \end{equation}

(the penalty). For example, the difference of \(0.1\) between \((Az)_k\) and \(b_k\) contributes \(0.01\) to the penalty, no matter what \(k\) is. But in some contexts we may want to treat different data points differently:

  • We are more certain about some of the numbers \(b_k\) than others.
  • Some of the numbers are much larger than others, and we expect the absolute errors for them to be larger.
  • We applied some transformation to the data, like in the previous sections.

In such cases we may want to introduce weights \(w_k\) (some positive numbers) and minimize the sum

\begin{equation} \sum_k ((Az)_k-b_k)^2 w_k^2\label{eq-penalty-lsq-weights}\tag{28.3.2} \end{equation}

instead of (28.3.1). Larger weights mean that we penalize the residuals (the differences between the model and the data) more. To solve this weighted least squares problem in Matlab, we need to multiply the first equation in \(Az=b\) by \(w_1\text{,}\) the second by \(w_2\text{,}\) and so on. To do this, arrange the weights into a column vector w and let

z = (A.*w)\(b.*w);

The array operations .* take care of multiplying each equation by its weight.

Weights can be used to mitigate the distortion caused by data transformation of the form \(\hat y = T(y)\text{.}\) Indeed, the application of \(T\) magnifies the errors around \(y=y_k\) by the factor of \(|T'(y_k)|\text{.}\) If this is undesirable, we should use the weights \(w_k = 1/|T'(y_k)|\) to compensate.

In Example 28.2.1 we used the logit transformation \(T(y) = \log(y/(1-y)) = \log(y) - \log(1-y)\text{.}\) Its derivative is

\begin{equation*} T'(y) = \frac{1}{y} + \frac{1}{1-y} = \frac{1}{y(1-y)}\text{.} \end{equation*}

This is a positive quantity, so we do not need absolute value; the weights are simply \(w_k = y_k(1-y_k)\text{.}\)

Improve the solution of Example 28.2.1 by using weights to compensate for data transformation. Compare the results to the previous ones.

Solution

We continue the code from Example 28.2.1 by adding the following lines.

w = y.*(1-y);            % weights 
beta = (X.*w)\(yt.*w);   % weighted least squares solution
f = @(x) 1 ./ (1 + exp(-(x.^(0:1))*beta));
figure();
plot(t, f(t), 'b', x, y, 'r*')
title('Weighted Least Squares')

The first two lines compute new parameters \(\beta_1, \beta_2\) for logistic function \(1/(1+\exp(-\beta_1 - \beta_2 x))\text{,}\) using weights. The rest proceeds as before. The plot is shown on a new figure.

Note that we do not always want to compensate for data transformation by using weights \(w_k = 1/|T'(y_k)|\text{.}\) Linear regression works best if the standard deviation of different values \(y_k\) (assumed to include random errors) is about the same. When the values grow exponentially and are measured with the same relative accuracy, the standard deviation will grow with the values. In this case, the logarithmic transformation \(T(y)=\log y\) may help with non-linearity of the model and with the uniformity of standard deviations.