Exponential fit

Section 28.1 Exponential fit

Exponential growth is modeled by a function of the form \(f(x) = ae^{bx}\text{.}\) Suppose we have some data \((x_k, y_k)\text{,}\) \(k=1, \dots, n\text{,}\) and think that it describes exponential growth (which naturally occurs in ecology and epidemiology). A direct attempt to minimize the sum of differences squared:

\begin{equation} \sum_{k=1}^n \left(y_k - ae^{bx_k}\right)^2 \to \min \label{eq-sum-squares-exponential}\tag{28.1.1} \end{equation}

is possible but requires optimization methods which we will consider later. There is an alternative approach: instead of fitting \(ae^{bx}\) to the values \(y_k\text{,}\) fit its logarithm, \(bx + \log a\text{,}\) to the values \(\log y_k\text{.}\) This means we will minimize the sum of squares of differences of (natural) logarithms:

\begin{equation} \sum_{k=1}^n \left(\log(y_k) - bx_k - \log a\right)^2 \to \min\label{eq-sum-squares-exponential-log}\tag{28.1.2} \end{equation}

It is important to understand the two problems (28.1.1) and (28.1.2) are not equivalent. If exact fit is possible, either approach will find it. But in general, some deviations (nonzero residuals) are inevitable, and the two approaches penalize deviations in different ways.

For example, if the given y-values are [1 2 10] and we have two competing exponential functions: one predicts [0.5 2 10] while the other predicts [1 2 11]. Using the penalty function (28.1.1), the first model has the penalty \(0.25\) while the second has penalty \(1\text{,}\) so the first looks better. But if we look at the differences of logarithms, then the first has penality \((\log 1 - \log 0.5)^2 = 0.48\) while the second has penalty \((\log 10 - \log 11)^2 = 0.01\text{,}\) so the second looks much better.

Most importantly, the minimization problem (28.1.2) can be easily solved using the method of Chapter 27. For notational convenience let \(\beta_1= \log a\) and \(\beta_2 = b\text{.}\) If the left hand side of (28.1.2) was zero, that would mean that \(\beta_1, \beta_2\) is a solution of the overdetermined linear system

\begin{equation} \beta_1 + \beta_2 x_k = \log y_k,\quad k=1, \dots, n\label{eq-transformed-lls}\tag{28.1.3} \end{equation}

The least squares solution of (28.1.3) is the vector \(\beta = (\beta_1, \beta_2)\) that minimizes (28.1.2).

Example 28.1.1. Fit an exponential to Covid data.

The following are new Covid infections in France as reported on Fridays in September-October 2020 (from Sep 4 to Oct 23).

y = [6011 7742 9335 12048 10946 14618 20399 29472]';

Fit an exponential function to this data, using week numbers 1:8 as x-values. Extend it two weeks into the future by plotting it up to \(x=10\text{.}\) Include the original data on the plot.

Solution

The vector y is already defined above, but we need yt = log(y) to get transformed y-values.

x = (1:8)';  
yt = log(y);
X = x.^(0:1); 
beta = X\yt;
f = @(x) exp((x.^(0:1))*beta);

t = linspace(1, 10, 1000)';
plot(t, f(t), 'b', x, y, 'r*')