Simple ODE Solvers — Error Behaviour

Section D.2 Simple ODE Solvers — Error Behaviour

We now provide an introduction to the error behaviour of Euler's Method, the improved Euler's method and the Runge-Kutta algorithm for generating approximate solutions to the initial value problem

\begin{align*} y'(t)&=f\big(t,y(t)\big) \\ y(t_0)&=y_0 \end{align*}

Here \(f(t,y)\) is a given known function, \(t_0\) is a given initial time and \(y_0\) is a given initial value for \(y\text{.}\) The unknown in the problem is the function \(y(t)\text{.}\)

Two obvious considerations in deciding whether or not a given algorithm is of any practical value are

the amount of computational effort required to execute the algorithm and
the accuracy that this computational effort yields.

For algorithms like our simple ODE solvers, the bulk of the computational effort usually goes into evaluating the function¹ \(f(t,y)\text{.}\) Euler's method uses one evaluation of \(f(t,y)\) for each step; the improved Euler's method uses two evaluations of \(f\) per step; the Runge-Kutta algorithm uses four evaluations of \(f\) per step. So Runge-Kutta costs four times as much work per step as does Euler. But this fact is extremely deceptive because, as we shall see, you typically get the same accuracy with a few steps of Runge-Kutta as you do with hundreds of steps of Euler.

To get a first impression of the error behaviour of these methods, we apply them to a problem that we know the answer to. The solution to the first order constant coefficient linear initial value problem

\begin{align*} y'(t)&=y-2t \\ y(0)&=3 \end{align*}

\begin{equation*} y(t)=2+2t+e^t \end{equation*}

In particular, the exact value of \(y(1)\text{,}\) to ten decimal places, is \(4+e=6.7182818285\text{.}\) The following table lists the error in the approximate value for this number generated by our three methods applied with three different step sizes. It also lists the number of evaluations of \(f\) required to compute the approximation.

	Euler		Improved Euler		Runge Kutta
steps	error	#evals	error	#evals	error	#evals
\(5\)	\(2.3\times 10^{-1}\)	\(5\)	\(1.6\times 10^{-2}\)	\(10\)	\(3.1\times 10^{-5}\)	\(20\)
\(50\)	\(2.7\times 10^{-2}\)	\(50\)	\(1.8\times 10^{-4}\)	\(100\)	\(3.6\times 10^{-9}\)	\(200\)
\(500\)	\(2.7\times 10^{-3}\)	\(500\)	\(1.8\times 10^{-6}\)	\(1000\)	\(3.6\times 10^{-13}\)	\(2000\)

Observe

Using 20 evaluations of \(f\) worth of Runge-Kutta gives an error 90 times smaller than 500 evaluations of \(f\) worth of Euler.
With Euler's method, decreasing the step size by a factor of ten appears to reduce the error by about a factor of ten.
With improved Euler, decreasing the step size by a factor of ten appears to reduce the error by about a factor of one hundred.
With Runge-Kutta, decreasing the step size by a factor of ten appears to reduce the error by about a factor of about \(10^4\text{.}\)

Use \(A_E(h)\text{,}\) \(A_{IE}(h)\) and \(A_{RK}(h)\) to denote the approximate value of \(y(1)\) given by Euler, improved Euler and Runge-Kutta, respectively, with step size \(h\text{.}\) It looks like

Equation D.2.1.

\begin{align*} A_E(h) &\approx y(1)+K_Eh\\ A_{IE}(h) &\approx y(1)+ K_{IE}h^2 \\ A_{RK}(h)&\approx y(1)+ K_{RK}h^4 \end{align*}

with some constants \(K_E,\ K_{IE}\) and \(K_{RK}\text{.}\)

To test these conjectures further, we apply our three methods with about ten different step sizes of the form \(\frac{1}{n}=\frac{1}{2^m}\) with \(m\) integer. Below are three graphs, one for each method. Each contains a plot of \(Y=\log_2 e_n\text{,}\) the (base 2) logarithm of the error for step size \(\frac{1}{n}\text{,}\) against the logarithm (of base 2) of \(n\text{.}\) The logarithm of base 2 is used because \(\log_2n=\log_2 2^m=m\) — nice and simple.

Here is why it is a good reason to plot \(Y=\log_2 e_n\) against \(x=\log_2 n\text{.}\) If, for some algorithm, there are (unknown) constants \(K\) and \(k\) such that

\begin{equation*} \text{approx value of }y(1)\text{ with step size } h= y(1)+K h^k \end{equation*}

then the error with step size \(\frac{1}{n}\) is \(e_n=K\frac{1}{n^k}\) and obeys

\begin{equation*} \log_2 e_n =\log_2 K -k \log_2 n \tag{E1} \end{equation*}

The graph of \(Y=\log_2 e_n\) against \(x=\log_2 n\) is the straight line \(Y=-kx+\log_2K\) of slope \(-k\) and \(y\) intercept \(\log_2K\text{.}\)

Remark D.2.2.

This procedure can still be used even if we do not know the exact value of \(y(1)\text{.}\) Suppose, more generally, that we have some algorithm that generates approximate values for some (unknown) exact value \(A\text{.}\) Call \(A_h\) the approximate value with step size \(h\text{.}\) Suppose that

\begin{equation*} A_h=A+Kh^k \end{equation*}

with \(K\) and \(k\) constant (but also unknown). Then plotting

\begin{equation*} y=\log(A_h-A_{h/2})=\log\left(Kh^k-K\left(\tfrac{h}{2}\right)^k\right) =\log\left(K-\tfrac{K}{2^k}\right)+k\log h \end{equation*}

against \(x=\log h\) gives the straight line \(y=mx+b\) with slope \(m=k\) and \(y\) intercept \(b=\log\left(K-\tfrac{K}{2^k}\right)\text{.}\) So we can

read off \(k\) from the slope of the line and then
compute \(K=e^b\left(1-\frac{1}{2^k}\right)^{-1}\) from the \(y\) intercept \(b\) and then
compute² \(A=A_h-Kh^k\text{.}\)

Here are the three graphs — one each for the Euler method, the improved Euler method and the Runge-Kutta method.

Each graph contains about a dozen data points, \((x, Y)=(\log_2n, \log_2e_n)\text{,}\) and also contains a straight line, chosen by linear regression, to best fit the data. The method of linear regression for finding the straight line which best fits a given set of data points is covered in Example 2.9.11 of the CLP-3 text. The three straight lines have slopes \(-0.998\) for Euler, \(-1.997\) for improved Euler and \(-3.997\) for Runge Kutta. Reviewing (E1), it sure looks like \(k=1\) for Euler, \(k=2\) for improved Euler and \(k=4\) for Runge-Kutta (at least if \(k\) is integer).

So far we have only looked at the error in the approximate value of \(y(t_f)\) as a function of the step size \(h\) with \(t_f\) held fixed. The graph below illustrates how the error behaves as a function of \(t\text{,}\) with \(h\) held fixed. That is, we hold the step size fixed and look at the error as a function of the distance, \(t\text{,}\) from the initial point.

From the graph, it appears that the error grows exponentially with \(t\text{.}\) But it is not so easy to visually distinguish exponential curves from other upward curving curves. On the other hand, it is pretty easy to visually distinguish straight lines from other curves, and taking a logarithm converts the exponential curve \(y=e^{kx}\) into the straight line \(Y=\log y = k\,x\text{.}\) Here is a graph of the logarithm, \(\log e(t)\text{,}\) of the error at time \(t\text{,}\) \(e(t)\text{,}\) against \(t\text{.}\) We have added a straight line as an aide to your eye.

It looks like the \(\log\) of the error grows very quickly initially, but then settles into a straight line. Hence it really does look like, at least in this example, except at the very beginning, the error \(e(t)\) grows exponentially with \(t\text{.}\)

The above numerical experiments have given a little intuition about the error behaviour of the Euler, improved Euler and Runge-Kutta methods. It's time to try and understand what is going on more rigorously.

Subsection D.2.1 Local Truncation Error for Euler's Method

We now try to develop some understanding as to why we got the above experimental results. We start with the error generated by a single step of Euler's method.

Definition D.2.3. Local truncation error.

The (signed) error generated by a single step of Euler's method, under the assumptions that we start the step with the exact solution and that there is no roundoff error, is called the local truncation error for Euler's method. That is, if \(\phi(t)\) obeys \(\phi'(t) = f\big(t,\phi(t)\big)\) and \(\phi(t_n)=y_n\text{,}\) and if \(y_{n+1}=y_n+ hf(t_n,y_n)\text{,}\) then the local truncation error for Euler's method is

\begin{equation*} \phi\big(t_{n+1}\big) -y_{n+1} \end{equation*}

That is, it is difference between the exact value, \(\phi\big(t_{n+1}\big)\text{,}\) and the approximate value generated by a single Euler method step, \(y_{n+1}\text{,}\) ignoring any numerical issues caused by storing numbers in a computer.

Denote by \(\phi(t)\) the exact solution to the initial value problem

\begin{equation*} y'(t)=f(t,y)\qquad y(t_n)=y_n \end{equation*}

That is, \(\phi(t)\) obeys

\begin{equation*} \phi'(t) = f\big(t,\phi(t)\big)\qquad \phi(t_n)=y_n \end{equation*}

for all \(t\text{.}\) Now execute one more step of Euler's method with step size h:

\begin{equation*} y_{n+1}=y_n+hf\big(t_n,y_n\big) \end{equation*}

Because we are assuming that \(y_n=\phi(t_n)\)

\begin{equation*} y_{n+1}= \phi(t_n)+ hf\big(t_n,\phi(t_n)\big) \end{equation*}

Because \(\phi(t)\) is the exact solution, \(\ \phi'(t_n) = f\big(t_n,\phi(t_n)\big)= f(t_n,y_n)\ \) and

\begin{equation*} y_{n+1}= \phi(t_n)+ h\phi'(t_n) \end{equation*}

The local truncation error in \(y_{n+1}\) is, by definition, \(\phi(t_{n+1})-y_{n+1}\text{.}\)

Taylor expanding (see (3.4.10) in the CLP-1 text) \(\phi(t_{n+1})=\phi(t_n+h)\) about \(t_n\)

\begin{equation*} \phi(t_n+h)=\phi(t_n)+\phi'(t_n)h+\tfrac{1}{2} \phi''(t_n)h^2 +\tfrac{1}{3!}\phi'''(t_n)h^3+\cdots \end{equation*}

so that

\begin{align*} &\phi(t_{n+1})-y_{n+1}\\ &= \big[\phi(t_n)+\phi'(t_n)h+\tfrac{1}{2} \phi''(t_n)h^2 +\tfrac{1}{3!}\phi'''(t_n)h^3+\cdots\big] -\big[\phi(t_n)+ h\phi'(t_n)\big]\\ &= \tfrac{1}{2} \phi''(t_n)h^2+\tfrac{1}{3!}\phi'''(t_n)h^3+\cdots \tag{E2} \end{align*}

Notice that the constant and \(h^1\) terms have cancelled out. So the first term that appears is proportional to \(h^2\text{.}\) Since \(h\) is typically a very small number, the \(h^3\text{,}\) \(h^4\text{,}\) \(\cdots\) terms will usually be much smaller than the \(h^2\) term.

We conclude that the local truncation error for Euler's method is \(h^2\) times some unknown constant (we usually don't know the value of \(\frac{1}{2} \phi''(t_n)\) because we don't usually know the solution \(\phi(t)\) of the differential equation) plus smaller terms that are proportional to \(h^r\) with \(r\ge 3\text{.}\) This conclusion is typically written

Equation D.2.4.

\begin{equation*} \text{Local truncation error for Euler's method }= Kh^2+O(h^3) \end{equation*}

The symbol \(O(h^3)\) is used to designate any function that, for small \(h\text{,}\) is bounded by a constant times \(h^3\text{.}\) So, if \(h\) is very small, \(O(h^3)\) will be a lot smaller than \(h^2\text{.}\)

To get from an initial time \(t=t_0\) to a final time \(t=t_f\) using steps of size \(h\) requires \((t_f-t_0)/h\) steps. If each step were to introduce an error³ \(Kh^2+O(h^3)\text{,}\) then the final error in the approximate value of \(y(t_f)\) would be

\begin{equation*} \frac{t_f-t_0}{h}\ \Big[Kh^2+O(h^3)\Big] = K(t_f-t_0)\, h+O(h^2) \end{equation*}

This very rough estimate is consistent with the experimental data for the dependence of error on step size with \(t_f\) held fixed, shown on the first graph after Remark D.2.2. But it is not consistent with the experimental time dependence data above, which shows the error growing exponentially, rather than linearly, in \(t_f-t_0\text{.}\)

We can get some rough understanding of this exponential growth as follows. The general solution to \(y'=y-2t\) is \(y=2+2t+c_0e^t\text{.}\) The arbitrary constant, \(c_0\text{,}\) is to be determined by initial conditions. When \(y(0)=3\text{,}\) \(c_0=1\text{.}\) At the end of step 1, we have computed an approximation \(y_1\) to \(y(h)\text{.}\) This \(y_1\) is not exactly \(y(h)=2+2h+e^h\text{.}\) Instead, it is a number that differs from \(2+2h+e^h\) by \(O(h^2)\text{.}\) We choose to write the number \(y_1=2+2h+e^h+O(h^2)\) as \(2+2h+(1+\epsilon)e^h\) with \(\epsilon=e^{-h} O(h^2)\) of order of magnitude \(h^2\text{.}\) That is, we choose to write

\begin{equation*} y_1=2+2t+c_0e^t\Big|_{t=h}\qquad\text{with }c_0=1+\epsilon \end{equation*}

If we were to make no further errors we would end up with the solution to

\begin{equation*} y'=y-2t,\qquad y(h)= 2+2h+(1+\epsilon)e^h \end{equation*}

which is⁴

\begin{align*} y(t) &= 2+2t+(1+\epsilon)e^t =2+2t+ e^t +\epsilon e^t \\ &=\phi(t) +\epsilon e^t \end{align*}

So, once as error has been introduced, the natural time evolution of the solutions to this differential equation cause the error to grow exponentially. Other differential equations with other time evolution characteristics will exhibit different \(t_f\) dependence of errors⁵. In the next section, we show that, for many differential equations, errors grow at worst exponentially with \(t_f\text{.}\)

Subsection D.2.2 Global Truncation Error for Euler's Method

Suppose once again that we are applying Euler's method with step size \(h\) to the initial value problem

\begin{align*} y'(t)&=f(t,y) \\ y(0)&=y_0 \end{align*}

Denote by \(\phi(t)\) the exact solution to the initial value problem and by \(y_n\) the approximation to \(\phi(t_n),\ t_n=t_0+nh\text{,}\) given by \(n\) steps of Euler's method (applied without roundoff error).

Definition D.2.5. Global truncation error.

The (signed) error in \(y_n\) is \(\ \phi(t_n)-y_n \ \) and is called the global truncation error at time \(t_n\text{.}\)

The word “truncation” is supposed to signify that this error is due solely to Euler's method and does not include any effects of roundoff error that might be introduced by our not writing down an infinite number of decimal digits for each number that we compute along the way. We now derive a bound on the global truncation error.

Define

\begin{equation*} \varepsilon_n=\phi(t_n)-y_n \end{equation*}

The first half of the derivation is to find a bound on \(\varepsilon_{n+1}\) in terms of \(\varepsilon_n\text{.}\)

\begin{align*} \varepsilon_{n+1}&=\phi(t_{n+1})-y_{n+1} \\ &=\phi(t_{n+1})-y_n-hf(t_n,y_n) \\ &=[{\color{red}{\phi(t_n)}}-y_n]+h[{\color{blue}{f\big(t_n,\phi(t_n)\big)}}-f(t_n,y_n)]\\ &\hskip1.5in +[\phi(t_{n+1})-{\color{red}{\phi(t_n)}}-h{\color{blue}{f\big(t_n,\phi(t_n)\big)}}] \tag{E3} \end{align*}

where we have massaged the expression into three manageable pieces.

The first \([\cdots]\) is exactly \(\varepsilon_n\text{.}\)
The third \([\cdots]\) is exactly the local truncation error. Assuming that \(|\phi''(t)|\le A\) for all \(t\) of interest⁶, we can bound the third \([\cdots]\) by
\begin{equation*} \big|\phi(t_{n+1})-\phi(t_n)-hf\big(t_n,\phi(t_n)\big)\big|\le \half Ah^2 \end{equation*}
This bound follows quickly from the Taylor expansion with remainder ((3.4.32) in the CLP-1 text),
\begin{align*} \phi(t_{n+1})&=\phi(t_n)+\phi'(t_n)h+\half \phi''(\tilde t)h^2 \\ &=\phi(t_n)+h\, f\big(t_n,\phi(t_n)\big)+\half \phi''(\tilde t)h^2 \end{align*}
for some \(t_n\lt \tilde t\lt t_{n+1}\text{.}\)
Finally, by the mean value theorem, the magnitude of the second \([\cdots]\) is \(h\) times
\begin{align*} |f\big(t_n,\phi(t_n)\big)-f(t_n,y_n)| &= F_{t_n}\big(\phi(t_n)\big)- F_{t_n}(y_n) \quad \text{where}\quad F_{t_n}(y) = f\big(t_n,y\big)\\ &=\big|F'_{t_n}(\tilde y)\big| \,|\phi(t_n)-y_n|\\ &\hskip1.35in\text{for some }\tilde y\text{ between }y_n\text{ and }\phi(t_n) \\ &=\big|F'_{t_n}(\tilde y)\big|\,|\varepsilon_n|\\ &\le B|\varepsilon_n| \end{align*}
assuming that \(\big|F'_t(y)\big| \le B\) for all \(t\) and \(y\) of interest⁷.

Substituting into (E3) gives

\begin{equation*} |\varepsilon_{n+1}| \le |\varepsilon_n| + Bh|\varepsilon_n| +\half Ah^2 = (1+Bh)|\varepsilon_n| +\half Ah^2 \tag{E4-n} \end{equation*}

Hence the (bound on the) magnitude of the total error, \(|\varepsilon_{n+1}|\text{,}\) consists of two parts. One part is the magnitude of the local truncation error, which is no more than \(\half Ah^2\) and which is present even if we start the step with no error at all, i.e. with \(\varepsilon_n=0\text{.}\) The other part is due to the combined error from all previous steps. This is the \(\varepsilon_n\) term. At the beginning of step number \(n+1\text{,}\) the combined error has magnitude \(|\varepsilon_n|\text{.}\) During the step, this error gets magnified by no more than a factor of \(1+Bh\text{.}\)

The second half of the derivation is to repeatedly apply (E4-n) with \(n=0,1,2,\cdots\text{.}\) By definition \(\phi(t_0)=y_0\) so that \(\varepsilon_0=0\text{,}\) so

\begin{alignat*}{2} &(\text{E4-0})\implies|\varepsilon_1|&&\le (1+Bh)|\varepsilon_0| +\tfrac{A}{2}h^2 =\tfrac{A}{2} h^2 \\ &(\text{E4-1})\implies|\varepsilon_2|&&\le (1+Bh)|\varepsilon_1| +\tfrac{A}{2} h^2 =(1+Bh)\tfrac{A}{2}h^2+\tfrac{A}{2}h^2\\ &(\text{E4-2})\implies|\varepsilon_3|&&\le (1+Bh)|\varepsilon_2| +\tfrac{A}{2}h^2 =(1+Bh)^2\tfrac{A}{2}h^2+(1+Bh)\tfrac{A}{2}h^2+\tfrac{A}{2}h^2 \end{alignat*}

Continuing in this way

\begin{equation*} |\varepsilon_n|\le (1+Bh)^{n-1}\tfrac{A}{2}h^2+\cdots+(1+Bh)\tfrac{A}{2}h^2+\tfrac{A}{2}h^2 =\sum_{m=0}^{n-1} (1+Bh)^m \tfrac{A}{2}h^2 \end{equation*}

This is the beginning of a geometric series, and we can sum it up by using \(\ \sum\limits_{m=0}^{n-1} ar^m=\frac{r^n-1}{r-1}a\ \) (which is Theorem 1.1.6(a)) with \(\ a=\tfrac{A}{2}h^2\ \) and \(\ r=1+Bh\ \) gives

\begin{equation*} |\varepsilon_n|\le \frac{(1+Bh)^n-1}{(1+Bh)-1}\frac{A}{2}h^2 =\frac{A}{2B}\big[(1+Bh)^n-1\big]h \end{equation*}

We are interested in how this behaves as \(t_n-t_0\) increases and/or \(h\) decreases. Now \(n=\frac{t_n-t_0}{h}\) so that \((1+Bh)^n=(1+Bh)^{(t_n-t_0)/h}\text{.}\) When \(h\) is small, the behaviour of \((1+Bh)^{(t_n-t_0)/h}\) is not so obvious. So we'll use a little trickery to make it easier to understand. Setting \(x=Bh\) in

\begin{equation*} x\ge 0\implies 1+x\le 1+x+\frac{1}{2}x^2+\frac{1}{3!}x^3+\cdots = e^x \end{equation*}

(the exponential series \(e^x= 1+x+\frac{1}{2}x^2+\frac{1}{3!}x^3+\cdots\) was derived in Example 3.6.5. gives⁸ \(1+Bh\le e^{Bh}\text{.}\) Hence \((1+Bh)^n\le e^{Bhn}=e^{B(t_n-t_0)}\text{,}\) since \(t_n=t_0+nh\text{,}\) and we arrive at the conclusion

Equation D.2.6.

\begin{equation*} |\varepsilon_n|\le \frac{A}{2B}\left[e^{B(t_n-t_0)}-1\right]h \end{equation*}

This is of the form \(K(t_f)h^k\) with \(k=1\) and the coefficient \(K(t_f)\) growing exponentially with \(t_f-t_0\text{.}\) If we keep \(h\) fixed and increase \(t_n\) we see exponential growth, but if we fix \(t_n\) and decrease \(h\) we see the error decrease linearly. This is just what our experimental data suggested.

Typically, evaluating a complicated function will take a great many arithmetic operations, while the actual ODE solver method (as per, for example, (D.1.3)) takes only an additional handful of operations. So the great bulk of computational time goes into evaulating \(f\) and we want to do it as few times as possible.

This is the type of strategy used by the Richardson extrapolation of Section C.1.

For simplicity, we are assuming that \(K\) takes the same value in every step. If, instead, there is a different \(K\) in each of the \(n=(t_f-t_0)/h\) steps, the final error would be \(K_1h^2\!+\!K_2h^2\!+\!\cdots\!+\!K_nh^2\!+\!n O(h^3)= \bar K nh^2\!+\!n O(h^3)= \bar K(t_f-t_0)\, h\!+\!O(h^2)\text{,}\) where \(\bar K\) is the average of \(K_1\text{,}\) \(K_2\text{,}\) \(\cdots\text{,}\) \(K_n\text{.}\)

Note that this \(y(t)\) obeys both the differential equation \(y'=y-2t\) and the initial condition \(y(h)= 2+2h+(1+\epsilon)e^h\text{.}\)

For example, if the solution is polynomial, then we might expect (by a similar argument) that the error also grows polynomially in \(t_f\)

We are assuming that the derivative \(\phi'(t)\) doesn't change too rapidly. This will be the case if \(f(t,y)\) is a reasonably smooth function.

Again, this will be the case if \(f(t,y)\) is a reasonably smooth function.

When \(x=Bh\) is large, it is not wise to bound the linear \(1+x\) by the much larger exponential \(e^x\text{.}\) However when \(x\) is small, \(1+x\) and \(e^x\) are almost the same.

CLP-2 Integral Calculus

Search Results: