The following file will give supplemental notes for topics covered in class. When a topic is covered in the online notes, we may not say much about it. When a topic isn't covered in the online notes, we will write a lot more. --------------------------------------------------------- Jan 4: Motivation: PageRank PageRank is a method of ranking websites on the World Wide Web used by Google to (help) decide which sites to show you first. PageRank takes a network of websites, each website having a certain number of links (or outlinks) to other websites, and tries to establish a ranking of the websites. It does this just on the basis of who has how many links to whom. For example, say that a network has two websites, A and B, and A has: 3 links to A, 4 links to B, and B has: 4 links to A, 5 links to B. How should we rank A versus B? Of course, there are many ways to do this. One wants to choose a method that is computationally feasible (for the WWW, with billions to hundreds of billions of sites, depending on how you measure it) and that is a "fair" measure of WWW prominence (it is not clear what "fair" means, but if it is easy to cheat then it definitely is "not fair"). Google's algorithm is (or was) know as PageRank; they are now secretive about the exact nature of their algorithm, but here is the basic idea: in the above network: A has: 3/7 of its links to A, and 4/7 of its links to B B has: 4/9 of its links to A, and 5/9 of its links to B we introduce variables: xA = the PageRank of A, xB = the PageRank of B and solve for xA + xB = 1 (so that the sum of all PageRanks is 1) xA = (3/7) xA + (4/9) xB xB = (4/7) xA + (5/9) xB (these two equations are the main PageRank eqs) We find xA = 7/16, xB = 9/16. Of course, most ways of ranking A and B based on this network should give xB a higher prominence than xA, since both A and B have more links to B than to A. You may have many questions about PageRank; we will answer them as the course progresses. The answers will touch on almost every topic we cover. For starters, we will need systematic ways of solving linear equations, and some understanding of the underlying geometry. Questions: (1) Give some other ways to rank websites given link data. How do they compare to PageRank (in terms of computation, ability to "cheat," and other measures of "fairness")? (2) Try PageRank for A has: 1 link to A, and no links to B B has: 1 link to A, and no links to B Does the answer make sense? (3) Try PageRank for A has: 1 link to A, and no links to B B has: no links to A, and 1 link to B Does the answer make sense? (4) Try PageRank for A has: 2 links to A, and 1 link to B B has: 3 links to A, and 6 links to B Does the answer make sense? (5) Explain why, based on PageRank, you might want a systematic way of solving linear systems of equations. Do you think there is only one way to find the solution, or many methods? (6) If we do not impose xA + xB = 1 in the PageRank calculations, will there ever be a unique solution? Can you give a geometric interpretation of your answer? --------------------------------------------------------- --------------------------------------------------------- Jan 6: Start Chapter 2 of Froese-Wetton notes. This chapter should take roughly 1-2 weeks. The idea here is to illustrate the geometry and linear algebra first in 2 and 3 dimensions, and introduce some ideas useful in physics such as the cross product. Everything here is quite standard. The notes use both i,j,k and e1,e2,e3 for the standard "basis" vectors in three space. The notes write the vectors in row notation, which is more convenient for writing; often we will write them as columns. We should be able to visualize vector additions and scalar multiplication. MATLAB syntax will not be discussed in class (I'd probably mess it up, anyway...) The monumentally big idea in this section is the dot product. Many fields of mathematics live and breath on the basis of a "dot product" (this is also called an inner product or scalar product). I cannot stress how fundamental an idea this is. This has tons of direct applications; in class we will discuss "correlation of vectors." Here we scratch the surface of this idea, but the cosine interpretation ( a . b = ||a|| ||b|| cos(theta) where theta is the angle between a and b) gives one of the essential insights. The dot product gives us a norm for each vector and an angle or cosine between any two nonzero vectors. Dot products extend to n-dimensional space by: [ x1 x2 ... xn ] . [ y1 y2 ... yn ] = x1 y1 + x2 y2 + ... + xn yn Example: How does height compare with distance to an 8 foot ceiling? Image three people, whose heights are 4, 5, and 6 feet. Heights = [ 4 5 6 ], DistCeiling = [ 4 3 2 ], cos of angle = ( 4 * 4 + 5 * 3 + 6 * 2 )/( sqrt(16+25+36) * sqrt(16+9+4) ) = .909963 which is close to 1. Hence these two vectors have a pretty good correlation. Now let's centre these hectors take out the average: Average Height = 5; CenteredHeights = Heights - [5 5 5] = [ -1 0 1 ], similary CenteredDistCeilings = [4 3 2] - [3 3 3] = [ 1 0 -1 ] the correlation between the centered vectors is -1. Hence they are opposite each other! Which is right???? Answer: of course, both are right. It depends what you want to explain. If people are near 5 foot tall, then (3/5) of their height gives you some idea of their distance to the ceiling; however, if you measure how the height varies from the average, it will be opposite to how the distance to the ceiling varies. Notice that Heights + DistCeiling = [8 8 8] no matter what the heights are (unless someone is more than 8 foot tall and she bumps into the ceiling...). The dot product allows us to compute projections, which is used in tons of applications. Question: To balance an airplane you'd like compartment A to weigh as much as compartment B, and you'd like both compartments to weigh as much as compartment C. Your current weights are [ wA wB wC ] = [ 1200 1400 2000 ] . How can shift stuff around minimally? Answer: It really depends what you mean by "shift stuff around minimally". One possibility is the projection. Your ideal ratio is [ 1 1 2 ]. The projection of [ 1200 1400 2000 ] onto [ 1 1 2 ] is ( [ 1200 1400 2000 ] . [ 1 1 2 ] ) / ( [ 1 1 2 ] [ 1 1 2 ] ) [ 1 1 2 ] = 6600 / 6 [ 1 1 2 ] = 1100 [ 1 1 2 ] = [ 1100 1100 2200 ] . This means that you should discard 100 from A, 300 from B, and add 200 to C. This means that you'll have to leave 200 (pounds) of stuff behind. Is this a good thing????? Another important idea is that of determinants and the cross product in 3-space. This has many uses, including changing variables in multivariate integration. However, both the ideas of determinants and the cross product involve a number of subtleties that are difficult to understand at this point (this involves "exterior algebra," which is beyond the scope of this course). So, unfortunately, much like the field of statistics, one usually uses it applications long before we can understand its foundations. You should understand the way we use determinants and cross products, although they may seem fundamentally mysterious at this point. :( (The dot product shouldn't seem mysterious...) The basic story of the determinant: The Froese-Wetton notes defines the determinant of 2 x 2 and 3 x 3 matrices, but leaves the "linear equation" interpretation of the determinant to later in Chapter 2. I feel it is helpful to start with this. One linear equation in one unknown typically has a unique solution, such as 3 y = 12 has the unique solution y=4. However the equation 0 y = 3 has no solutions, and 0 y = 0 has infinitely many solutions. In other words An equation in x, b x = c , where b and c are constants, has a unique solution in x if and only if b is not 0. Note the latter condition, that b is not 0, does not involve c. Analogous statements involve the determinant: two equations in two unknowns, b_{11} x_1 + b_{12} x_2 = c_1 b_{21} x_1 + b_{22} x_2 = c_2 have a unique solution iff [ b_{11} b_{12} ] det [ ] = b_{11} b_{22} - b_{12}b_{21} [ b_{21} b_{22} ] is not zero. Again, this statement does not involve c_1 or c_2. We can interpret two equations in two unknowns as the intersection of two lines in the plane, or as a question of whether [c_1 c_2] is spanned by [b_{11} b_{21}] and [b_{12} b_{22}]. In class we shall do some simple examples, like x_1+x_2=5, x_1-x_2=3. or x_1+x_2=5, 2 x_1 + 2 x_2 = 10 or x_1+x_2=5, 2 x_1 + 2 x_2 = 13 . Specific points on determinants and cross products: It is common to say that the cross product exists only in three dimensions. This is not exactly true. One can define a "wedge product" of two vectors in n-space, but this gives a vector in [n(n-1)/2]-space. So the wedge product of two vectors in 2-space returns a number, in 3-space returns three numbers, in 4-space returns six numbers, in 5-space returns ten numbers, etc. For example, in 2-space, the wedge product of two vectors, [ a1 a2 ] and [ b1 b2 ] is | a1 a2 | | | = a1 b2 - a2 b1 = ||a|| ||b|| sin( angle from a to b ) | b1 b2 | Does this look familiar from the Froese-Wetton notes?... It is only in three dimensions that the "wedge product" can give a vector of the same dimension, and the "cross product" is a particularly useful version of this, used in physics. The determinant can be said to measure the area/volume of the parallelogram (or parallelepiped) of its rows. The reason for this is difficult to explain without exterior algebra, at least for n x n matrices; however, for 2 x 2 and 3 x 3 matrices this is doable from scratch. The 2 x 2 determinant has a sine interpretation in the row (or column) vectors: det [ a1 a2; b1 b2 ] (using MATLAB notation) = a1 b2 - a2 b1 = ||a|| ||b|| sin(theta), where theta is the angle from a to b. Notice that if we exchange a and b, the determinant's sign switches. The cross product of a and b is determined by the formal determinant | i j k | | a1 a2 a3 | = a x b | b1 b2 b3 | It is very useful in physics, but it only "works" in three dimensions; it is difficult to give a fundamental explanation or justification of this product without exterior algebra. :( Notice that a x b = - b x a. We also have that, like the 2 x 2 determinant, || a x b || = || a || || b || | sine of angle from a to b | , and that a x b is perpendicular to both a and b. In fact, a x b can be characterized as the vector that has length || a || || b || | sine of angle from a to b | and, if this number is not zero, is orthogonal to a and b is satisfies the right-hand-rule. From the above formula it is not hard to see that if c is a vector, and if we exchange c1 for i, c2 for j, c3 for k, we simply get the dot product | c1 c2 c3 | | a1 a2 a3 | = ( a x b ) . c | b1 b2 b3 | In particular, this formula shows that a x b is perpendicular to a and b. All the vector products and determinants generalize from 2 or 3 dimensions to n dimensions, at least in some form. Even the cross product generalizes, but it takes its values in the exterior product of two copies of n-space (this is beyond the scope of this course). The dot product (in all dimensions) doesn't care about the order of the vectors. By constrast, determinants, cross products (and all other operations in exterior algebra) does care about the order, and swapping the order (or swapping two rows or columns in a determinant) will reverse the sign. -------------------------------------------------------------------- Jan 16-20: Finish Chapter 2, Start Chapter 3. In chapter 2 we discussed linear equation but didn't give a systematic way of solving them. Now we do. Section 3.2: We introduce: - general form (page 63 of Froese-Wetton) - 3 elementary operations (multiply, add one to another, exchange) - augmented matrix notation and use Gaussian elimination to find an echelon form of the matrix. The variables associated to squares are called "fixed" or "pivot" or "basic" variables, and you can solve for them in terms of the variables associated to circles (called "free" or "nonbasic") variables. (See Figure 3.2 on page 73.) ----------------------------------- Jan 30: Chapter 3 of Froese-Wetton: Applications We will not cover resistor networks in this section. We will illustrate the ideas of Chapter 3 with other applications. Application 1: Polynomial Curve Fitting, In Particular Parabolas. (This is "Linear Algebra without Linear Algebra.") Lets say you are given three readings of blood sugar levels at times t=1, t=2, and t=4, and you wish to fit the data to a parabola. This means that you are given real numbers g1, g2, g4, and you wish to solve for a,b,c such that p(t) = a + b t + c t^2 satisfies p(1)=g1, p(2)=g2, p(4)=g4. This means that you want to solve for a,b,c such that a + b + c = g1 a + 2 b + 4 c = g2 a + 4 b + 16 c = g4 Does this have a unique solution? What if we replaced the readings at t=1, t=2, t=4 by readings at three other times? What if we tried to do this with five readings and a polynomial of degree four (i.e., p(t) = a + b t + c t^2 + d t^3 + e t^4 )? In class we identified many ways of checking whether the above 3 x 3 linear system has a unique solution (Gaussian elimination, computing a determinant, computing a triple product, seeing if [ 1 1 1 ], [ 1 2 4 ], [1 4 16 ] are linearly independent, etc.). Here is a way that involves no calculation: Consider the homogeneous system a + b + c = 0 a + 2 b + 4 c = 0 a + 4 b + 16 c = 0 This corresponds to a polynomial, q(t) = a + b t + c t^2 such that q(1)=q(2)=q(4)=0. But a number of people argued that any parabola with three roots ( t=1, t=2, t=4 ) must be identically zero ( either from the geometry of parabolas, or from the fact that q(t) must be divisible by (t-1)(t-2)(t-4), or by Rolle's theorem ). Hence the homogeneous system has a unique solution. This means that the structure of the pivots in Gaussian elimination looks like pivot blah blah 0 pivot blah 0 0 pivot Hence the original curve fitting problem has a unique solution. This teaches us a number of important general principles: (1) In an n x n system (i.e., n equations with n unknowns), there is a unique solution if and only if the homogeneous system has a unique solution (i.e., the all zero solution). (2) Sometimes one can reason about the homogeneous system without calculating anything. More questions about the parabola fitting: (1) How can we tell if the parabola is sloping down or up? (As discussed in class, sloping up is bad for glycemic index readings...) (2) What if you use a function p(t) = a + b t e^(-t) + c t^2 e^(-t) ? What about a function p(t) = a + b t e^(- c t) ? [Ans: The first gives you a linear system in a,b,c; the second doesn't...] How could you solve the second one? [Hint: Imagine that you start with a t=0 reading, then take logarithms.] Note: fitting with parabolas or, more generally, polynomials of a fixed degree has a simple, explicit solution know as Lagrange polynomials. For example, (t-1)(t-2) ---------- (4-1)(4-2) is a polynomial that vanishes at t=1 and t=2, and whose value at t=4 is one. We easily see that (t-2)(t-4) (t-1)(t-4) (t-1)(t-2) p(t) = ---------- g1 + ---------- g2 + ---------- g4 (1-2)(1-4) (2-1)(2-4) (4-1)(4-2) is the degree two polynomial that goes through the three data points given above, i.e., such that p(1)=g1, p(2)=g2, p(4)=g4. Application 2: PageRank again. What more do we know about PageRank? Consider the first problem we looked at: A has: 3/7 of its links to A, and 4/7 of its links to B B has: 4/9 of its links to A, and 5/9 of its links to B which gives us equation: xA + xB = 1 (so that the sum of all PageRanks is 1) xA = (3/7) xA + (4/9) xB ("link voting for xA") xB = (4/7) xA + (5/9) xB ("link voting for xB") We have 3 eqations with 2 unknows. Why might we expect a unique solution? (Note: PageRank does not always have a unique solution, and many issues around PageRank will remain a mystery for now...) First note that the last two equations are homogeneous, and their sum reads xA + xB = xA + xB . In general, in a PageRank system on n websites, the "link voting" PageRank equations, in standard form, sum to zero. For example, here the equations read (4/7) xA - (4/9) xB = 0 -(4/7) xA + (4/9) xB = 0 Hence the n "link voting" PageRank equations are really just n-1 equations. Hence there will be at least a one parameter family of solutions to this system. The idea is that the "sum of PageRanks is 1" equation, here xA + xB = 1, will usually give a unique solution. Let us see where things can go bad: A has: all of its links to A B has: all of its links to B xA + xB = 1 (so that the sum of all PageRanks is 1) xA = (1) xA + (0) xB ("link voting for xA") xB = (0) xA + (1) xB ("link voting for xB") Does it make sense that this is not unique?