Modern Theory of Markov Chains

This is the page for the course in 2013.
For information about this year's course see here.

Time and location

  • Tuesdays 11:00--13:00, from February 5 till June 18
  • Room 610 of the math building (aka Hans Freudenthalgebouw)


How many times do we need to shuffle a deck of $n$ cards in order to make them evenly mixed? Will a population of bacteria survive or eventually go extinct? Will a particle jumping randomly from one place to another eventually return to its original whereabouts? How quickly does a Monte Carlo simulation converge (if at all)? Motivated by modern applications and guided by examples, we develop the basic theory of Markov chains and familiarize ourselves with various mathematical techniques invented to analyze them.

This course is a part of the master's program Stochastics and Financial Mathematics.


Working knowledge of basic concepts of probability theory (at a level covered by a first course in probability) is required. Mathematical maturity and computer skills are desirable.


Other references: Some interesting reads:
  • Persi Diaconis, The Markov Chain Monte Carlo Revolution, Bulletin of the American Mathematical Society 46:179--205, 2009.
  • Persi Diaconis, The mathematics of mixing things up, Journal of Statistical Physics 144(3):445--459, 2011.
  • Olle Häggström, Problem solving is often a matter of cooking up an appropriate Markov chain, Scandinavian Journal of Statistics 34:768--780, 2007.
  • Mark Jerrum and Alistair Sinclair, The Markov chain Monte Carlo method: an approach to approximate counting and integration, Chapter 12 in Approximation Algorithms for NP-hard Problems, 1996.

Course webpage



Siamak Taati


  • Homework assignments during the course: 50%
  • Final project and/or presentation: 50%
If you are a student from Utrecht attending this course, please remember to register on OSIRIS by April 26.
The course code is WISS123.

Weekly outline

[June 18, 2013]
Project presentations
  • Nicos: Markov chains as dynamical systems
  • Jenny: Symmetric group and card shuffling
[June 11, 2013]
No class --- conference break
[June 7, 2013] Time: 14:30, Place: Buys Ballot Gebouw 069
Project presentations
  • Serieke: Path coupling technique
  • Lukáš: Mixing times and set hitting times
[June 4, 2013]
No new material.
Review of assignments and/or what the course was about.
Bring your questions, comments, food for discussion.
[May 28, 2013]
We continued the discussion of electric networks and their correspondence with random walks. The correspondence provides a general approach to study the recurrence/transience of random walks. We gave a proof of Pólya's theorem in the two-dimensional case (i.e., the simple random walk on $\mathbb{Z}^2$ is recurrent) and gave the idea of the proof in the three-dimensional case (i.e., the simple random walk on $\mathbb{Z}^3$ is transient).

Starting with a general setting of electric networks, we showed how the question of recurrence/transience is reduced to finding appropriate lower/upper bounds for the effective resistance between the origin (i.e., a designated vertex) and the boundary of a box of size $n$ around the origin. The random walk is recurrent if and only if this effective resistance diverges to infinity as $n\to\infty$.

Using Rayleigh's monotonicity principle, we were able to find a simple lower bound for the effective resistance in the two-dimensional lattice and show that it diverges to infinity. This settled the two-dimensional part of Pólya's theorm.

To prove the monotonicity principle, we used Thomson's variational principle, which characterizes the current of a network with a given current boundary condition as the divergence-free flow that minimizes the dissipated energy. The variational principle also suggests a method to find upper bounds for the effective resistance.

At the end, we gave a plausibility argument for the transience of the three-dimensional random walk. This was the idea behind Lyons's proof of the theorem, which you can find in chapter 21 of the textbook, or in chapter 1 of Grimmett's book. Other proofs can be found in the book of Doyle and Snell and in the paper by Levin and Peres.

Nerd Sniping
[May 21, 2013]
We discussed the analogy between random walks and electric networks. Our goal (to be achieved next week) is to use this analogy to prove Pólya's theorem on random walks:
The simple random walk on $\mathbb{Z}^d$ is recurrent for $d=1,2$ and transient for $d\geq 3$.
The material is covered in chapters 9 and 21 of the textbook, but we mostly follow chapter 1 of Grimmett's book.

After observing the analogy between the drunkard's walk and an electric circuit consisting of $n$ unit resistors connected in series to a unit battery, we formulated and proved a general correspondence between random walks on weighted graphs and electric networks. Namely, we introduced a game on a weighted graph $(V,E)$ as follows. A subset $A\subseteq V$ of vertices are designated as final states, and and a number $g_0(a)$ is associated to each final state. The player performs a random walk on the graph until it arrives at one of the final states, when the game stops. If arrived at vertex $a\in A$, the player gains/loses $g_0(a)$ points. We wish to know the expected gain $g(x)$ if the player starts the random walk at vertex $x$.

The corresponding electric network is obtained by placing a resistor (with appropriate resistance) in place of each edge, and connecting the final vertices to batteries in such a way that the voltage of $a\in A$ is $g_0(a)$. We claimed that the expected gain $g(x)$ of the random walk starting at vertex $x$ can be read from the electric network by measuring the voltage $W(x)$ of vertex $x$. This correspondence was proved by showing that both expected gain $g(\cdot)$ and the voltage $W(\cdot)$ are harmonic at every non-boundary vertex $x\in V\setminus A$, and that there is only one function that is harmonic at every vertex $x\in V\setminus A$ and satisfies a given boundary condition on $A$.

We then observed how the problem of recurrence/transience of a random walk on a graph (such as $\mathbb{Z}^d$) can be translated using this correspondence to a problem about electric networks.
[May 14, 2013]
[May 7, 2013]
We studied the mixing speed of the top-to-random card shuffling method.

The Markov chain that models the top-to-random shuffling has a unique stationary distribution (why?) which is the uniform distribution (why?). Starting with any arrangement of cards, the shuffling eventually mixes the cards uniformly (why?).

The distance from stationary distribution has a cut-off at time $t_\text{mix}\approx n\log n$, where $n$ is the number of cards in the deck. Namely, the (worst case) distance from stationarity remains close to its maximal value (i.e., $1$) until around time $n\log n$, when in a relatively short period (of length $O(n)$) it drops to almost $0$. We proved this by establishing matching upper and lower bounds for the distance from stationary distribution in the vicinity of $n\log n$.

To prove the upper bound, we used the fact that the Markov chain has a strong stationary time that has the same distribution as the time of acquiring all the coupons in the coupon collecting problem.
[April 30, 2013]
Queen's day
[April 23, 2013]
We went through two more examples in which coupling arguments could be used to estimate mixing times.

For the lazy version of the Ehrenfest model ($\equiv$ lazy random walk on the hypercube), we showed that the mixing time satisfies $t_\text{mix}(\mathrm{e}^{-r})\leq n\log n + rn$ (for all $r>0$). This, we proved by constructing a simultaneous construction (coupling) of the chain with all initial conditions. This reduced the problem to the so-called coupon collecting problem, which we took the chance to study in some detail.

We then studied the Gibbs sampler for the hard-core model on a finite graph. The hard-core Gibbs sampler has rapid convergence only for small values of the fugacity $\lambda$. Our aim was to show that for $\lambda$ sufficiently small, the mixing time is bounded as $t_\text{mix}(\mathrm{e}^{-r})\leq \alpha_\lambda (n\log n + r n)$, where $\alpha_\lambda$ is a constant depending on $\lambda$.

The proof (which we didn't find time to finish) is an example of the path coupling method. The basic idea is to couple two copies of the chain whose starting configurations disagree on exactly one position (a single particle). For any two configurations $x$ and $y$ we can then find a sequence $x=z_0,z_1,\ldots,z_n=y$ of configurations each disagreeing with the next one in exactly one position.

Check out this (rather inefficient) Mathematica simulation (requires Version 9) of the Gibbs sampler for the hard-core model on a toroidal square lattice $\mathbb{Z}_n\times\mathbb{Z}_n$. Compare the evolution for small fugacities (say, $\lambda=0.25$) and large fugacities (say, $\lambda=25$).
[April 16, 2013]
As an exercise in coupling arguments, we gave a proof of Proposition 5.6 from the textbook, which gives a bound $O(t^{-1/2})$ for the distance between the distributions at times $t$ and $t+1$ for the lazy version of any irreducible Markov chain. The coupling argument reduced the problem to the problem of estimating the tail of the distribution of the first hitting time of site $0$ for a simple random walk on $\mathbb{Z}$ starting at site $1$. We used the hitting time theorem and Stirling's approximation to obtain the claimed bound.

We then used this as an excuse to prove two curious theorems about random walks on $\mathbb{Z}$: the hitting time theorem and the ballot theorem. For the hitting time theorem, we followed the elegant proof given in the paper To prove the ballot theorem, we used a well-known combinatorial trick --- the reflection principle.
[April 9, 2013]
We continued with the study of the speed of mixing in finite-state irreducible and aperiodic Markov chains.

As another example of using couplings to estimate the mixing speed, we considered a simple random walk on a path of length $n$ with loops at each end (Example 5.1 from the textbook). We found that the coupling time can be estimated by the first hitting time of position $n$ if the walk starts at position $0$. The expected value of this hitting time is $n(n+1)$, and the Markov inequality gives the bound $t_\text{mix}(\varepsilon)\leq n(n+1)/\varepsilon$ for the mixing time with accuracy $\varepsilon$. Being a bit more careful, we then noticed that the Markov bound for the coupling time was independent of the initial positions of the two coupled walks, and the unfortunate pedestrian principle lead us to the better bound $t_\text{mix}(2^{-r})\leq 2rn(n+1)$.

We then studied the evolution of distance from stationarity $d(t)$ (and the related function $\bar{d}(t)$) for arbitrary finite-state irreducible and aperiodic Markov chains and verified that it decays exponentially. As a corollary, we found that in general, $t_\text{mix}(2^{-r})\leq r\, t_\text{mix}(1/4)$. Interpretation: the mixing time $t_\text{mix}(1/4)$ (or $t_\text{mix}(\varepsilon_0)$ for any fixed constant $\varepsilon_0<1/2$) can be seen as the time scale of the exponential decay of distance from stationarity.
[April 2, 2013]
We talked about the speed of convergence to equilibrium in (finite-state) irreducible, aperiodic chains.

The convergence theorem says that for a finite-state irreducible aperiodic Markov chain $(X_t)_{t\geq 0}$, the distribution of $X_t$ approaches (in total variation distance) to the stationary distribution. In most applications (say, in Monte Carlo algorithms, or for physical models), we want to know how long we should wait to be sure that the distribution of $X_t$ is within distance $\varepsilon>0$ from the stationary distribution. The smallest such time is called the mixing time (for accuracy $\varepsilon$).

One approach to estimate the mixing time is to devise thought experiments in which two copies of the chain are run together (a coupling). One such thought experiment we had already used to prove the convergence theorem (independent before meeting, moving together once met). More sophisticated couplings (thought experiments) tailor-made for the particular model at hand could provide good estimates for the mixing time.

For a lazy random walk on a cycle $\mathbb{Z}_n$, we devised a simple coupling that gave the upper bound $\frac{n^2}{4\varepsilon}$ for the mixing time. Similarly, for a lazy random walk on a $d$-dimensional torus $\mathbb{Z}_n^d$, we obtained the mixing time $\frac{n^2 d^2}{4\varepsilon}$. The way these bounds depend on $\varepsilon$ (inversely proportional) does not seem appealing, because we know that the distance from stationarity decreases exponentially. We shall improve this next time.
[March 26, 2013]
Three important examples where the question of recurrence vs. transience shows up:
  • (Random walk) Will the walk eventually hit a certain vertex?
  • (Branching process) Does a population of bacterial eventually go extinct?
  • (Convergence theorem) Will the two coupled chains eventually merge?
We proved four simple lemmas that clarified the notions of recurrence and transience.
  • state $a$ recurrent $\ \Rightarrow\ $ $\mathbb{P}_a(\text{infinitely many returns to $a$})=1$.
  • state $a$ transient $\ \Rightarrow\ $ $\mathbb{E}_a(\text{number of visits to $a$})<\infty$.
    (In fact, the "number of visits to $a$" has a geometric distribution.)
  • $a$ recurrent $\ \&\ $ $a\longrightarrow b$ $\ \Rightarrow\ $ $\mathbb{P}_b(T_a<\infty)=1$.
    (In particular, $b\longrightarrow a$.)
  • Recurrence is a class property for the communication relation.
    (i.e.,$\ $ $a$ recurrent $\ \&\ $ $a\longleftrightarrow b$ $\ \Rightarrow\ $ $b$ recurrent.)
We verified that for a simple random walk on $\mathbb{Z}$, every state is recurrent, although we knew that the expected return time for every state is infinite.
For the basic coupling used in the convergence theorem, the diagonal set is a recurrent communication class that is accessible from all the other states.
Assignment 6 --- due: April 16, 2013
[March 19, 2013]
We gave a (partial) proof of the convergence theorem.

We mentioned several possible interpretations of the convergence of a sequence of distributions to another distribution. Two examples:
  • The probability of each state converges to its target value.
  • The total variation distance from the target distribution goes to zero.
For distributions on a countable set, all the interpretations are equivalent.

For two simple examples (random walk on an odd cycle, and the lazy Ehrenfest model), we saw that simple arguments using coupling two copies of the chain can be used to show convergence to the unique stationary distribution. We then formulated a general coupling argument that proves convergence if we know that the two coupled chains eventually merge.
Assignment 5 (slightly updated) --- due: April 9, 2013
[March 12, 2013]
We talked about Monte Carlo simulations.

We introduced the Ising model of ferromagnetism as a guiding example. The macroscopic state of the system is described by a probability distribution (the Boltzmann distribution) on the configuration space. In order to study the model via simulations, one needs to be able to generate random samples from this distribution.

We mentioned few other examples (from statistical mechanics and computer science) where one needs to generate random samples from a prescribed distribution.

One approach is to design an ergodic Markov chain whose stationary distribution is the desired distribution, and run the chain for a long period. A Gibbs sampler chain does the trick. Another construction (which we did not find time to discuss about) is the Metropolis chain.

A Gibbs sampler for the Ising model [Mathematica notebook (requires Version 9)]
[March 5, 2013]
We mentioned without proof the convergence theorem (a.k.a. mixing theorem):
Any Markov chain (on a countable state space) that is irreducible and aperiodic and has a stationary distribution approaches that stationary distribution.
We discussed the problem of identifying the stationary distribution(s) of a Markov chain. In many cases, the symmetries of the chain allow us to find a stationary distribution without pain. We talked about two types of such symmetries:
  • [rotations] Random walk on a (finite) group: the uniform distribution is stationary
    (e.g., random walk on the cyclic group $\mathbb{Z}_n$; Ehrenfest urn)
  • [time-reversal] Any distribution satisfying the detailed balance (reversibility) condition is stationary. (e.g., random walk on an undirected graph; Ehrenfest urn)
We found a reason why more or less any card shuffling scheme (satisfying two rather weak natural conditions) mixes the order of the cards properly if repeated for sufficiently long.
Finally, we asked:
Do we get a Markov chain if we reverse the direction of time in a bi-infinite Markov chain?
We found that the time-reversal of any bi-infinite Markov chain in equilibrium is itself a Markov chain.
Assignment 4 (slightly updated) --- due: March 19, 2013
[February 26, 2013]
Playing with Markov chains in Mathematica [Notebook (requires Version 9) or CDF or PDF]
We discussed the problem of uniqueness of stationary distribution.
  • Irreducibility guarantees the uniqueness.
  • The unique stationary distribution is linked to the expected first return times
Notes on the uniqueness of stationary distribution. (See also the notes on existence below.)
[February 19, 2013]
We discussed the problem of existence of stationary distribution.
We first made the setting explicit:
  • What is a Markov chain mathematically. The key features:
    • Markov property
    • time-homogeneity
    • finite or countably infinite number of possible states at a given time
  • Transition matrix
  • Suitable sample space: space of all trajectories/infinite paths/sequences of states
In this setting,
  • We could formulate what is (statistical) equilibrium: an equilibrium state is identified by a stationary distribution.
  • Lemma: every (time-homogeneous) Markov chain on a finite state space has at least one stationary distribution.
Notes on the existence of stationary distribution.
Assignment 3 --- due: March 5, 2013 (preferably)
[February 12, 2013]
We considered two examples:
  • Ehrenfest urn model (two formulations: based on whether we keep track of individual balls or merely the number of balls in each urn), and
  • simple random walk on a cycle of length $n$,
and asked the following questions
  • What does it mean to be in (statistical) equilibrium?
  • Which distribution could be an equilibrium distribution (a.k.a. stationary distribution)?
  • Is the stationary distribution unique?
  • Does the model approach the equilibrium as time passes by?
For the Ehrenfest urn model (the formulation with distinguishable balls), we found an equilibrium distribution and showed that it is the only one. For the random walk on a cycle, we discussed the coupling approach to study approach towards equilibrium. Whether there is such convergence to equilibrium distribution depends on whether the cycle is even or odd.
Assignment 2 --- due: February 19, 2013 (extended till: preferably February 26, 2013)
Errata: an extra condition is missing in Problem 1(a). For now, you could assume that $N$ is independent of $Z_i$ and drop part (b). Later we shall prove a version of Wald's equation that covers part (b).
[February 5, 2013]
We discussed few examples:
  • A drunkard's walk on a street with $n$ blocks (a.k.a gambler's ruin: example 2.1 in the book): Which place is he going to end up (home or the pub) and what is the expected time of arrival?
  • Simple random walk on $\mathbb{Z}$: Will it eventually pass through every single site? What about the two-dimensional or three-dimensional variants?
  • Population of bacteria (Galton-Watson model): Is there a chance that the population survives forever, or will they sooner or later go extinct?
  • Shuffling a deck of cards, the riffle style: Does repeated shuffling mix the permutation of the cards evenly? If so, how many shuffles are needed to get the deck sufficiently mixed?
Assignment 1 --- due: February 19, 2013
Last Update: June 2, 2013