Search for probability and statistics terms on Statlect
StatLect

Markov chain

by , PhD

Markov chains are sequences of random variables (or vectors) that possess the so-called Markov property: given one term in the chain (the present), the subsequent terms (the future) are conditionally independent of the previous terms (the past).

This lecture is a roadmap to Markov chains. Unlike most of the lectures in this textbook, it is not an exhaustive treatment of the subject with detailed proofs, derivations, examples and exercises (for which a separate textbook would be needed). Its aim is to provide a quick introduction to some concepts that are often used in applied statistics (e.g., in Markov Chain Monte Carlo methods). Readers who are interested in a more detailed treatment can have a look at the references reported at the end of this page.

Table of Contents

Definition

Let us start with a formal definition.

Definition Let [eq1] be a sequence of random vectors. The sequence [eq2] is said to be a Markov chain if and only if any given term of the sequence X_n is independent of all terms preceding $X_{n-1}$ conditional on $X_{n-1}$: [eq3]where the letter F denotes a conditional distribution function.

State space

The state space $S$ of a Markov chain [eq2] is the set of all possible realizations of the terms of the chain. In other words, for any given term X_n, the support of X_n is included in $S$.

In what follows we present the main facts about Markov chains, by tackling, in order of increasing difficulty, the cases of:

Markov chains with a finite state space

Let us start with the case of a finite state space. In particular, we assume that[eq5]that is, the terms of the chain can take one of $J$ values [eq6].

Time-homogeneity

We specify an initial distribution for the first value of the chain, that is, a $1	imes J$ vector of initial probabilities $pi _{1}$ and impose [eq7]

Then, we can choose a $J	imes J$ transition probability matrix $P$ (a matrix such that each of its rows is a vector of probabilities) and impose[eq8]for all n and $i,jleq J$. The assumption that $P$ is equal for all n is called time-homogeneity (think of the index n as a measure of time).

These two choices (of the initial distribution $pi _{1}$ and the transition distribution $P$) completely determine the distribution of all the terms of the chain. As a matter of fact, for any n, we have [eq9]where[eq10]

The latter equality holds because[eq11]

Stationary distribution

If, for a given transition probability matrix $P$, there is an initial distribution $pi $ such that the distribution of all the terms of the chain is equal to the initial distribution, then $pi $ is called a stationary distribution (or invariant distribution) of the chain.

When $pi $ is a stationary distribution, we have that[eq12]

Together with the fact that[eq13]it implies[eq14]

Detailed balance

A Markov chain with finite state space is said to satisfy the detailed balance condition if and only if there exists a distribution $pi _{b}$ such that[eq15]for any $i,jleq J$.

By summing both sides of the equation over $j$, we get[eq16]or[eq17]

But[eq18]

Therefore,[eq19]for any $ileq J$. But this can be written in matrix form as[eq20]

As a consequence, $pi _{b}$ is a stationary distribution of the chain.

Note that detailed balance is a more stringent requirement than the existence of a stationary distribution: the former implies the latter, but the converse is not true.

Irreducible chain

We now introduce some properties of Markov chains that are frequently used to study their asymptotic behavior. The first such property is so-called irreducibility.

Let $x,yin S$. Define the hitting time [eq21]where the subscript x indicates that the chain is assumed to start from $X_{1}=x$.

We say that x leads to $y$ if and only if [eq22]

Again, the notation $QTR{rm}{P}_{x}$ means that the probability is computed under the assumption that $X_{1}=x$.

A Markov chain is said to be irreducible if and only if every state leads to itself and to every other state, that is, if and only if there is a positive probability that for any starting state x the chain will reach any other state $y$ (including x itself) in finite time.

Recurrent chain

A state $yin S$ is called recurrent if and only if [eq23]

Otherwise, it is called transient.

In other words, a state $y$ is recurrent if and only if the probability that the chain will return to $y$ in finite time (after having started from $y$ itself) is equal to 1.

A Markov chain is called recurrent if and only if all the elements of its state space are recurrent.

Aperiodic chain

Let $yin S$. The period of $y$ is defined as [eq24]where $gcd $ is the greatest common denominator. In other words, $d_{y}$ is the minimum time the chain takes to return to x (after starting from x itself).

It is possible to prove that if the chain is irreducible, then all its states have the same period $d$, which is called the period of the chain.

A chain is called aperiodic if and only if the period of the chain is $d=1$.

Existence and uniqueness of the stationary distribution

We have the following important result.

Proposition If a Markov chain with a finite state space is irreducible, then it has a unique stationary distribution.

Convergence to the stationary distribution

If we assume not only irreducibility, but also aperiodicity, we get the following result.

Proposition If a Markov chain with a finite state space is irreducible and aperiodic, then, irrespective of the initial distribution $pi _{1}$, [eq25]where $pi $ is the unique stationary distribution of the chain.

So, even if the initial distribution of the chain is not the stationary distribution, the terms X_n of the sequence become less and less dependent on the initial value X_1 as n increases and their distributions converge to the stationary distribution.

Strong law of large numbers

Irreducibility can also be used to prove a Strong Law of Large Numbers.

Proposition If a Markov chain [eq2] with a finite state space $S$ is irreducible, then, for any bounded function [eq27],[eq28]where $pi $ is the unique stationary distribution of the chain and [eq29] denotes almost sure convergence as n tends to infinity.

Thus, when the state space is finite, irreducibility guarantees that sample averages taken across the chain converge to population averages taken across the states.

This kind of proposition is often called an ergodic theorem.

Markov chains with a countable state space

We now tackle the case in which the state space $S$ is infinite but countable.

In particular, we assume that[eq30]that is, the elements of the state space can be arranged into a sequence [eq31].

Time-homogeneity

The initial distribution of the chain is a sequence of initial probabilities [eq32] such that [eq33]

Then, we can choose a double sequence of transition probabilities $P_{ij	ext{ }}$such that, for any index i,[eq34]and we impose[eq35]for all n, i and $j$.

Note that the transition probabilities $P_{ij}$ are independent of n. This property is called time-homogeneity.

These two choices (of the initial distribution [eq36] and the transition probabilities $Pij$) completely determine the distribution of all the terms of the chain.

The distribution of X_2 is a sequence [eq37] whose elements can be derived as follows: [eq38]

The distributions of the other terms of the chain X_n are sequences [eq39] that can be derived recursively as follows:[eq40]

Stationary distribution

The concept of stationary distribution is almost identical to that introduced in the case of a finite state space.

If, for a given double sequence of transition probabilities $P_{ij}$, there is an initial distribution [eq41] such that the distribution of all the terms of the chain is equal to the initial distribution, then [eq42] is called a stationary distribution of the chain.

Furthermore, we have that[eq43]

Detailed balance

A Markov chain with countable state space is said to satisfy the detailed balance condition if and only if there exists a distribution [eq44] such that[eq45]for any $i,jin U{2115} $.

This implies that[eq46]or[eq47]

But[eq48]

Therefore,[eq49]for any $ileq J$. So, [eq50] is a stationary distribution of the chain.

Irreducible chain

The concept of irreducibility is the same found in the case of a finite state space.

Positive recurrent chain

When dealing with countable state spaces, we usually strengthen the concept of recurrence introduced for the case of a finite state space.

Remember that a state $yin S$ is called recurrent if and only if the probability that the chain will return to $y$ (after having started from $y$ itself) is equal to 1. [eq51]

Even if a state is recurrent, it could happen that[eq52]that is, the expected value of the return time is infinite.

A state is said to be positive recurrent if and only if the latter possibility is ruled out, that is, if and only if[eq53]

A Markov chain is called positive recurrent if and only if all the elements of its state space are positive recurrent.

Aperiodic chain

The concept of aperiodicity is identical to that found in the case of a finite state space.

Existence and uniqueness of the stationary distribution

The following important result holds.

Proposition If a Markov chain with a countable state space is irreducible and positive recurrent, then it has a unique stationary distribution.

Note that in the case of a finite state space irreducibility was sufficient to obtain the uniqueness of the stationary distribution. In the countable case, we need to add the requirement of positive recurrence.

Convergence to the stationary distribution

The following convergence result holds for chains having a countable state space.

Proposition If a Markov chain with a countable state space is irreducible, positive recurrent and aperiodic, then, irrespective of the initial distribution [eq54], [eq55]for any $j$, where [eq42] is the unique stationary distribution of the chain.

Compare this proposition to the one for finite state spaces:

Strong law of large numbers

Positive recurrence is needed also to prove a Strong Law of Large Numbers.

Proposition If a Markov chain [eq2] with a countable state space $S$ is irreducible and positive recurrent, then[eq58]where [eq59] is any function such that the expected value [eq60] exists and is finite, [eq61] is the unique stationary distribution of the chain and [eq62] denotes almost sure convergence as n tends to infinity.

Often, there are cases in which positive recurrence is difficult to prove, but we know that the chain has a stationary distribution. In these cases, we can use the following proposition.

Proposition If a Markov chain [eq2] with a countable state space $S$ is irreducible and has a unique stationary distribution [eq64], then[eq65]for any function [eq59] such that the expected value [eq67] exists and is finite.

Markov chains with an uncountable state space

We now analyze the more difficult case in which the state space $S$ is infinite and uncountable.

Time-homogeneity

The initial distribution of the chain is a probability measure $pi _{1}$ such that [eq68]for any event $Asubseteq S$.

Then, we can choose a function [eq69] called transition kernel and we impose[eq70]for all $xin S$ and all events $Asubseteq S$.

The transition kernel does not depend on n. It is time-homogeneous.

These two choices (of the initial distribution $pi _{1}$ and the transition kernel [eq71]) completely determine the distribution of all the terms of the chain.

The distribution of X_2 can be derived as follows: [eq72]where $Asubseteq S$ is any event and the integral is a Lebesgue integral with respect to the probability measure $pi _{1}$.

The distributions of the other terms of the chain X_n can be derived recursively as follows:[eq73]

When the terms of the sequence are continuous variables (or vectors), then X_1 has a probability density [eq74] such that[eq75]and the transition kernel can be expressed in terms of a conditional probability density [eq76]:[eq77]

As a consequence, the marginal densities [eq78] of the terms of the chain can be expressed as[eq79]

Stationary distribution

The concept of stationary distribution is similar to that found above for the cases of a finite and a countable state space.

If, for a given transition kernel [eq80], there is an initial distribution [eq81] such that the distribution of all the terms of the chain is equal to the initial distribution, then [eq82] is called a stationary distribution of the chain.

Furthermore, we have that [eq81] satisfies[eq84]for any A.

In the case in which the transition kernel and the stationary distribution have densities, we can write[eq85]

Detailed balance

A Markov chain with uncountable state space and transition kernel is said to satisfy the detailed balance condition if and only if there exists a probability measure $pi _{b}$ such that[eq86]

If the measure $pi _{b}$ and the transition kernel [eq87] can be written in terms of probability densities, then the detailed balance condition can be written as[eq88]

By integrating both sides of the equation with respect to $y$, we get[eq89]or[eq90]

Since[eq91]we get[eq92]

Thus, $pi _{b}$ is a stationary distribution.

Phi-irreducible chain

The concept of irreducibility can be generalized to the case of an uncountable state space.

Let $xin S$ and $Asubseteq S$ be an event. Define the hitting time [eq93]where the subscript x indicates that the chain is assumed to start from $X_{1}=x$.

We say that x leads to A if and only if [eq94]

Again, the notation $QTR{rm}{P}_{x}$ means that the probability is computed under the assumption that $X_{1}=x$.

A Markov chain is said to be $arphi $-irreducible if and only if there is a measure $arphi $ such that every state x leads to A when [eq95].

In other words, a chain is said to be $arphi $-irreducible if and only if there is a positive probability that for any starting state x the chain will reach any set A having positive measure in finite time.

Harris recurrent chain

When dealing with uncountable state spaces, we often use a concept of recurrence, called Harris recurrence, that is even stronger than the concept of positive recurrence introduced in the case of an uncountable state space.

An event $Asubseteq S$ is called Harris recurrent if and only if the probability that the chain will return to A infinitely often (after having started from a point belonging to A) is equal to 1.

A Markov chain is called Harris recurrent if and only if it is $arphi $-irreducible and all the events $Asubseteq S$ such that [eq96] are Harris recurrent.

Aperiodic chain

In the uncountable case, the definition of aperiodicity is slightly more complicated.

A Markov chain is said to have period $d$ if its state space can be partitioned into at most $d$ mutually exclusive events such that the chain always takes exactly $d$ periods to cycle through these events.

In symbols, if the events are $A_{1}$,...,$A_{d}$, then you have [eq97]

A chain is said to be aperiodic if its period is $d=1$.

Existence and uniqueness of the stationary distribution

Unlike in the finite and countable cases, $arphi $-irreducibility and Harris recurrence are not sufficient to guarantee the existence and uniqueness of the stationary distribution. Further technical conditions need to be met (see, e.g., Glynn 2013).

Convergence to the stationary distribution

Because $arphi $-irreducibility and Harris recurrence are not sufficient to guarantee the existence and uniqueness of the stationary distribution, the latter is often directly added among the sufficient conditions for the convergence of the chain to a stationary distribution.

Proposition If a Markov chain with an uncountable state space is $arphi $-irreducible, Harris recurrent, aperiodic and has a stationary distribution $pi $, then $pi _{n}$ converges to $pi $.

The convergence of the sequence of probability distributions $pi _{n}$ to the stationary distribution $pi $ is in total variation norm (a technical detail that we can safely skip here).

Strong law of large numbers

In the uncountable case, the following ergodic theorem holds.

Proposition If a Markov chain [eq2] with an uncountable state space $S $ is $arphi $-irreducible, Harris recurrent and has a stationary distribution $pi $, then[eq99]for any function [eq59] such that the expected value [eq101] exists and is finite.

References

Gilks, W. R., Richardson, S., Spiegelhalter D. (1995) Markov Chain Monte Carlo in Practice, CRC Press.

Glynn, P. W. (2013) Harris recurrence, Stochastic Systems lecture notes, Stanford University.

Hoel, P. G., Port, S. C., Stone, C. J. (1986) Introduction to Stochastic Processes, Waveland Press.

Norris, J. R. (1998) Markov Chains, Cambridge University Press.

Pishro-Nik, H. (2014) Introduction to Probability, Statistics, and Random Processes, Kappa Research, LLC.

Roberts, G. O., Rosenthal, J. S. (2006) Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov Chains, The Annals of Applied Probability, Vol. 16, No. 4, 2123-2139.

How to cite

Please cite as:

Taboga, Marco (2021). "Markov chain", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Markov-chains.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.