Solving Linear Equations

1.1 Solving a non-singular system of $n$ equations in $n$ unkowns
Lets start with a system of equations where the number of equations is the same as the number of unknowns. Such a system can be written as a matrix equation $$A{\bf x} = {\bf b},$$ where $A$ is a square matrix, ${\bf b}$ is a given vector, and ${\bf x}$ is the vector of unkowns we are trying to find. When $A$ is non-singular (invertible) there is a unique solution. It is given by ${\bf x} = A^{-1}{\bf b}$, where $A^{-1}$ is the inverse matrix of $A$. Of course, computing $A^{-1}$ is not the most efficient way to solve a system of equations.
For our first introduction to MATLAB/Octave, lets consider an example: $$ A = \left[\matrix{1&1&1\cr 1&1&-1\cr 1&-1&1\cr}\right]\quad {\bf b} = \left[\matrix{3\cr 1\cr 1\cr}\right], $$ First we define the matrix $A$ and the vector ${\bf b}$. Here is the input (in blue, after the prompt symbol >) and the output (in red).


>A=[1 1 1;1 1 -1;1 -1 1]

A =

   1   1   1
   1   1  -1
   1  -1   1

>b=[3;1;1]

b =

   3
   1
   1

Notice that that entries on the same row are separated by spaces (or commas) while rows are separated by semicolons. In MATLAB/Octave, column vectors are n by 1 matrices and row vectors are 1 by n matrices. The semicolons in the definition of b make it a column vector. In MATLAB/Octave, X' denotes the transpose of X. Thus we get the same result if we define b as


>b=[3 1 1]'

b =

   3
   1
   1

The solution can be found by computing the inverse of $A$ and multiplying


>x = A^(-1)*b

x =

   1
   1
   1

However if $A$ is a large matrix we don't want to actually calculate the inverse. The syntax for solving a system of equations efficiently is


>x = A\b

x =

   1
   1
   1

If you try this with a singular matrix $A$, MATLAB/Octave will complain and print an warning message. If you see the warning, the answer is not reliable! You can always check to see that ${\bf x}$ really is a solution by computing $A{\bf x}$.


>A*x

ans =

   3
   1
   1

As expected, the result is ${\bf b}$.

By the way, you can check to see how much faster A\b is than A^(-1)*b by using the functions tic() and toc(). The function tic() starts the clock, and toc() stops the clock and prints the elapsed time. To try this out, lets make A and b really big with random entries.


A=rand(1000,1000);
b=rand(1000,1);

Notice the semicolon ; at the end of the inputs. This suppresses the output. Without the semicolon, MATLAB/Octave would start writing the 1,000,000 random entries of $A$ to our screen! Now we are ready to time our calculations.

tic();A^(-1)*x;toc();
Elapsed time is 44 seconds.
tic();A\x;toc();
Elapsed time is 13.55 seconds.

So we see that A\b quite a bit faster.
1.2 Reduced row echelon form
How can we solve $A{\bf x}={\bf b}$ when $A$ is singular, or not a square matrix (that is, the number of equations is different from the number of unknowns)? In your previous linear algebra course you learned how to use elementary row operations to transform the original system of equations to an upper triangular system. The upper triangular system obtained this way has exactly the same solutions as the original system. However, it is much easier to solve. In practice, the row operations are performed on the augmented matrix $[A | {\bf b}]$. If efficiency is not an issue, then addition row operations can be used to bring the system into reduced row echelon form. In the this form, the pivot columns have a $1$ in the pivot position and zeros elsewhere. For example, if $A$ is a square non-singular matrix then the reduced row echelon form of $[A | {\bf b}]$ is $[I | {\bf x}]$, where $I$ is the identity matrix and ${\bf x}$ is the solution. In MATLAB/Octave you can compute the reduced row echelon form in one step using the function rref(). For the system we considered above we do this as follows. First define A and b as before. This time I'll suppress the output.

>A=[1 1 1;1 1 -1;1 -1 1];
>b=[3 1 1]';

In MATLAB/Octave, the square brackets [ ... ] can be used to construct larger matrices from smaller building blocks, provided the sizes match correctly. So we can define the augmented matrix $C$ as


>C=[A b]

C =

   1   1   1   3
   1   1  -1   1
   1  -1   1   1

Now we compute the reduced row echelon form.


>rref(C)
ans = 1 0 0 1 0 1 -0 1 0 0 1 1

The solution appears on the right. Now lets try to solve $A{\bf x}={\bf b}$ with $$ A = \left[\matrix{1&2&3\cr 4&5&6\cr 7&8&9\cr}\right]\quad {\bf b} = \left[\matrix{1\cr 1\cr 1\cr}\right], $$ This time the matrix $A$ is singular and doesn't have an inverse. Recall that the determinant of a singular matrix is zero, so we can check by computing it.


>A=[1 2 3; 4 5 6; 7 8 9];
>det(A)

ans = 0

However we can still try to solve the equation $A{\bf x} = {\bf b}$ using Gaussian elimination.


>b=[1 1 1]';
>rref([A b])

ans =

   1.00000   0.00000  -1.00000  -1.00000
   0.00000   1.00000   2.00000   1.00000
   0.00000   0.00000   0.00000   0.00000

Letting $x_3 = s$ be a parameter, and proceeding as you learned last year, we arrive at the general solution $$ {\bf x} = \left[\matrix{-1\cr 1\cr 0\cr}\right] + s \left[\matrix{1\cr -2\cr 1\cr}\right] $$ On the other hand, if $$ A = \left[\matrix{1&2&3\cr 4&5&6\cr 7&8&9\cr}\right]\quad {\bf b} = \left[\matrix{1\cr 1\cr 0\cr}\right], $$ then


>rref([1 2 3 1;4 5 6 1;7 8 9 0])

ans =

   1.00000   0.00000  -1.00000   0.00000
   0.00000   1.00000   2.00000   0.00000
   0.00000   0.00000   0.00000   1.00000

tells us that there is no solution.
1.3 Gaussian elimination steps using MATLAB/Octave
If C is a matrix in MATLAB/Octave, then C(1,2) is the entry in the 1st row and 2nd column. The whole first row can be extracted using C(1,:) while C(:,2) yields the second column. Finally we can pick out the submatrix of C consisting of rows 1-2 and columns 2-4 with the notation C(1:2,2:4). Lets illustrate this by performing a few steps of Gaussian elimination on the augmented matrix from our first example. Start with


C=[1 1 1 3; 1 1 -1 1; 1 -1 1 1];

The first step in Gaussian elimination is to subtract the first row from the second.


>C(2,:)=C(2,:)-C(1,:)

C =

   1   1   1   3
   0   0  -2  -2
   1  -1   1   1

Next, we subtract the first row from the third.


>C(3,:)=C(3,:)-C(1,:)

C =

   1   1   1   3
   0   0  -2  -2
   0  -2   0  -2

To bring the system into upper triangular form, we need to swap the second and third rows. Here is the MATLAB/Octave code.


>temp=C(3,:);C(3,:)=C(2,:);C(2,:)=temp

C =

   1   1   1   3
   0  -2   0  -2
   0   0  -2  -2

1.4 Matrix norm and condition number
There are many ways to measure the size of a matrix $A$. For example, we could consider $A$ to be a large vector whose entries happen to be written in a box rather than in a column or a row. From this point of view a natural norm to use is $$ \|A \|_{HS} = \sqrt{\sum_{i}\sum_{j} |a_{i,j}|^2}. $$ This norm is called the Hilbert-Schmidt norm. It has the advantage of being easy to compute.
However, when $A$ is considered as a linear transformation or operator, acting on vectors, there is another norm that is natural to use. It is defined by $$ \|A \| = \max_{x:\|x\|\ne 0} {{\|A{\bf x}\|}\over{\|{\bf x}\|}} $$ This norm measures the maximum factor by which $A$ can stretch the length of a vector. An equivalent definition is $$ \|A \| = \max_{x:\|x\|=1} \|A{\bf x}\| $$ The reason why these definitions give the same answer is that the quantity $\|A{\bf x}\|/\|{\bf x}\|$ does not change if we multiply $\bf x$ by a scalar. So, when calculating the maximum in the first expression for $\|A\|$, we we need only pick one vector in any given direction, and we might as well choose the unit vector.
In general, there is no easy formula that computes $\|A\|$ from its entries. However, if $A$ is a diagonal matrix there is. As an example lets consider a diagonal matrix $$ A = \left[\matrix{3&0&0\cr 0&2&0\cr 0&0&1\cr}\right] $$ If $$ {\bf x} = \left[\matrix{x_1\cr x_2\cr x_3\cr}\right] $$ then $$ {A\bf x} = \left[\matrix{3x_1\cr 2 x_2\cr x_3\cr}\right] $$ so that $$\eqalign{ \|A{\bf x}\|^2 &= |3x_1|^2 + |2 x_2|^2 + |x_3|^2\cr &= 3^2|x_1|^2 + 2^2 |x_2|^2 + |x_3|^2\cr &\le 3^2|x_1|^2 + 3^2 |x_2|^2 + 3^2|x_3|^2\cr &= 3^2\|{\bf x}\|^2\cr }$$ This implies that for any unit vector ${\bf x}$ $$ \|A{\bf x}\| \le 3 $$ and taking the maximum over all unit vectors $\bf x$ yields $\|A\| \le 3$. On the other hand, the maximum of $\|A{\bf x}\|$ over all unit vectors $\bf x$ is larger than the value of $\|A{\bf x}\|$ for any particular unit vector. In particular, if $$ {\bf e}_1 = \left[\matrix{1\cr 0\cr 0\cr}\right] $$ then $$ \|A\| \ge \|A{\bf e}_1\| = 3 $$ Thus we see that $$\|A\| = 3.$$ In general, the matrix norm of a diagonal matrix with diagonal entries $\lambda_1, \lambda_2, \cdots, \lambda_n$ is the largest value of $|\lambda_k|$.
The MATLAB/Octave code for a diagonal matrix with diagonal entries $3$, $2$ and $1$ is diag([3 2 1]) and the expression for the norm of $A$ is norm(A). So for example


>norm(diag([3 2 1]))

ans =  3



Lets return to the situation where $A$ is a square matrix and we are trying to solve $A{\bf x} = {\bf b}$. If $A$ is a matrix arising from a real world application (for example if $A$ contains values measured in an experiment) then it will almost never happen that $A$ is singular. After all, a tiny change in any of the entries of $A$ can change a singular matrix to a non-singular one. What is much more likely to happen is that $A$ is close to being singular. In this case $A^{-1}$ will still exist, but will have some enormous entries. This means that the solution ${\bf x} = {\bf b}$ will be very sensitive to the tiniest changes in ${\bf b}$ so that it might happen that round-off error in the computer completely destroys the accuracy of the answer.
To check whether a system of linear equations is well-conditioned, we might therefore think of using $\|A^{-1}\|$ as a measure. But this isn't quite right, since we actually don't care if $\|A^{-1}\|$ is large, provided it streches each vector about the same amount. For example. if we simply multiply each entry of $A$ by $10^{-6}$ the size of $A^{-1}$ will go way up, by a factor of $10^6$, but our ability to solve the system accurately is unchanged. The new solution is simply $10^{6}$ times the old solution, that is, we have simply shifted the position of the decimal point. So to measure how well conditioned a system of equations is, we take the ratio of the largest streching factor to the smallest shrinking factor. Another way of saying this is to let $\bf x$ range over all unit vectors, and then take the ratio of the largest possible value of $\|A{\bf x}\|$ to the smallest possible value. This leads to the following definition for the condition number of an invertible matrix: $$ {\rm cond}(A) = \|A\|\|A^{-1}\| $$ To see that this is the ratio of the largest streching factor to the smallest shrinking factor note that $$\eqalign{ \min_{{\bf x}\ne 0} {{\|A{\bf x}\|}\over{\|{\bf x}\|}} &=\min_{{\bf x}\ne 0} {{\|A{\bf x}\|}\over{\|{A^{-1}A\bf x}\|}}\cr &=\min_{{\bf y}\ne 0} {{\|{\bf y}\|}\over{\|{A^{-1}\bf y}\|}}\cr &={{1}\over{\displaystyle{\max_{{\bf y}\ne 0}{{\|{A^{-1}\bf y}\|}\over{\|{\bf y}\|}}}}}\cr &={{1}\over{\|A^{-1}\|}}\cr }$$ Here we used the fact that if ${\bf x}$ ranges over all non-zero vectors so does ${\bf y} = A{\bf x}$ and that the minimum of a collection of positive numbers is one divided by the maximum of their reciprocals.

In our applications we will use the condition number as a measure of how well we can solve the equations that come up accurately. To see why the condition number is a good measure of this lets start with a linear system of equations $A{\bf x} = {\bf b}$ and change the right side to ${\bf b}' = {\bf b} + \Delta {\bf b}$. The new solution is $$ {\bf x}' = A^{-1}({\bf b} + \Delta {\bf b}) = {\bf x} + \Delta{\bf x} $$ where ${\bf x} = A^{-1}{\bf b}$ is the original solution the change in the solutions is $\Delta{\bf x} = A^{-1}\Delta {\bf b}$. Now the absolute errors $\|\Delta {\bf b}\|$ and $\|\Delta{\bf x}\|$ are not very meaningful, since an absolute error $\|\Delta {\bf b}\| = 100$ is not very large if $\|{\bf b}\| = 1,000,000$ but is large if $\|{\bf b}\| = 1$ What we really care about are the relative errors $\|\Delta {\bf b}\| / \|{\bf b}\|$ and $\|\Delta{\bf x}\| / \|{\bf x}\|$. Can we bound the relative error in the solution in terms of the relative error in the equation? The answer is yes since $$\eqalign{ {{\|\Delta{\bf x}\| }\over{ \|{\bf x}\|}} &={{\|A^{-1}\Delta {\bf b}\| }\over{ \|A^{-1}{\bf b}\|}}\cr &={{\|A^{-1}\Delta {\bf b}\| }\over{ \|\Delta{\bf b}\|}} {{\|{\bf b}\| }\over{ \|A^{-1}{\bf b}\|}} {{\|\Delta {\bf b}\| }\over{ \|{\bf b}\|}}\cr &={{\|A^{-1}\Delta {\bf b}\| }\over{ \|\Delta{\bf b}\|}} {{\|AA^{-1}{\bf b}\| }\over{ \|A^{-1}{\bf b}\|}} {{\|\Delta {\bf b}\| }\over{ \|{\bf b}\|}}\cr &\le \|A^{-1}\|\|A\|{{\|\Delta {\bf b}\| }\over{ \|{\bf b}\|}}\cr &={\rm cond}(A){{\|\Delta {\bf b}\| }\over{ \|{\bf b}\|}}\cr }$$ This equation gives the real meaning of the condition number. If the condition number is near to $1$ then the relative error of the solution is about the same as the relative error in the equation. However, a large condition number means that a small relative error in the equation can lead to a large relative error in the solution.
Summary: Math Concepts
Review:
Matrix vector notation for linear equations.
Solving systems of equations using Gaussian Elmination.
Deciding whether there is a unique solution, many solutions or no solution.
Matrix inverse.
New:
Hilbert Schmidt norm
Matrix (operator) norm
Condition number
Summary: MATLAB/Octave Concepts
Entering matrices (and vectors).
Making larger matrices from smaller blocks.
Multiplying matrices.
Taking the inverse of a matrix.
Taking the transpose of a matrix.
Extracting elements, rows, columns, submatrices.
Solving equations using A\b.
Random matrices.
Timing using tic() and toc()
Computing the norm and condition number.
Homework problems
1. Use MATLAB/Octave to find the solution(s) to $A{\bf x} = {\bf b}$ where $$ A = \left[\matrix{1&1&1&1\cr 1&1&-1&-1\cr 1&-1&0&0\cr 0&0&1&-1&}\right]\quad {\bf b} = \left[\matrix{1\cr 1\cr 1\cr 1\cr}\right], $$

2. Use MATLAB/Octave to find the solution(s) to $A{\bf x} = {\bf b}$ where $$ A = \left[\matrix{1&1&1&1\cr 1&1&-1&-1\cr 1&-1&0&0\cr 0&0&1&1&}\right]\quad {\bf b} = \left[\matrix{3\cr 1\cr 0\cr 1\cr}\right], $$

3. Compute the time it takes to solve $A{\bf x} = {\bf b}$ using A\b on your computer for five or more random problems of increasing size. Make a plot of time vs. size. Repeat using the method A^(-1)*b and plot on the same graph. (If your calculation is taking too long, it can be interupted by typing $\lt$ctrl$\gt$ c.)

4. Show that for any square matrix $A$, $\|A\|_{HS}^2 = {\rm tr}(A^T A)$.

5. Guess whether each of the following statements about $n\times n$ matrices $A$ is true or false by testing them on a few random matrices. Bonus: prove that your guess is correct.
(i) $\|A^2\| =\|A\|^2$
(ii) $\|A^2\| \le \|A\|^2$
(iii) $\|A^T A\| = \|A\|^2$
(iv) $\|A\| \le \|A\|_{HS}$
(v) ${\rm cond}(A) = {\rm cond}(A^{-1})$
(vi) ${\rm cond}(A) \ge 1$