Markov chains are sequences of random variables (or vectors) that possess the so-called Markov property: given one term in the chain (the present), the subsequent terms (the future) are conditionally independent of the previous terms (the past).
This lecture is a roadmap to Markov chains. Unlike most of the lectures in this textbook, it is not an exhaustive treatment of the subject with detailed proofs, derivations, examples and exercises (for which a separate textbook would be needed). Its aim is to provide a quick introduction to some concepts that are often used in applied statistics (e.g., in Markov Chain Monte Carlo methods). Readers who are interested in a more detailed treatment can have a look at the references reported at the end of this page.
Table of contents
Let us start with a formal definition.
Definition
Let
be a sequence of
random vectors. The sequence
is said to be a Markov chain if and only if any given term of
the sequence
is
independent
of all terms preceding
conditional on
:
where
the letter
denotes a
conditional
distribution function.
The state space
of a Markov chain
is the set of all possible
realizations of the
terms of the chain. In other words, for any given term
,
the support of
is included in
.
In what follows we present the main facts about Markov chains, by tackling, in order of increasing difficulty, the cases of:
a finite state space
an infinite but countable state space
an uncountable state space
Let us start with the case of a finite state space. In particular, we assume
thatthat
is, the terms of the chain can take one of
values
.
We specify an initial distribution for the first value of the
chain, that is, a
vector of initial probabilities
and impose
Then, we can choose a
transition probability matrix
(a matrix such that each of its rows is a vector of probabilities) and
impose
for
all
and
.
The assumption that
is equal for all
is called time-homogeneity (think of the index
as a measure of time).
These two choices (of the initial distribution
and the transition distribution
)
completely determine the distribution of all the terms of the chain. As a
matter of fact, for any
,
we have
where
The latter equality holds
because
If, for a given transition probability matrix
,
there is an initial distribution
such that the distribution of all the terms of the chain is equal to the
initial distribution, then
is called a stationary distribution (or invariant
distribution) of the chain.
When
is a stationary distribution, we have
that
Together with the fact
thatit
implies
A Markov chain with finite state space is said to satisfy the detailed
balance condition if and only if there exists a distribution
such
that
for
any
.
By summing both sides of the equation over
,
we
get
or
But
Therefore,for
any
.
But this can be written in matrix form
as
As a consequence,
is a stationary distribution of the chain.
Note that detailed balance is a more stringent requirement than the existence of a stationary distribution: the former implies the latter, but the converse is not true.
We now introduce some properties of Markov chains that are frequently used to study their asymptotic behavior. The first such property is so-called irreducibility.
Let
.
Define the hitting time
where
the subscript
indicates that the chain is assumed to start from
.
We say that
leads to
if and only if
Again, the notation
means that the probability is computed under the assumption that
.
A Markov chain is said to be irreducible if and only if every
state leads to itself and to every other state, that is, if and only if there
is a positive probability that for any starting state
the chain will reach any other state
(including
itself) in finite time.
A state
is called recurrent if and only if
Otherwise, it is called transient.
In other words, a state
is recurrent if and only if the probability that the chain will return to
in finite time (after having started from
itself) is equal to
.
A Markov chain is called recurrent if and only if all the elements of its state space are recurrent.
Let
.
The period of
is defined as
where
is the greatest common denominator. In other words,
is the minimum time the chain takes to return to
(after starting from
itself).
It is possible to prove that if the chain is irreducible, then all its states
have the same period
,
which is called the period of the chain.
A chain is called aperiodic if and only if the period of the
chain is
.
We have the following important result.
Proposition If a Markov chain with a finite state space is irreducible, then it has a unique stationary distribution.
If we assume not only irreducibility, but also aperiodicity, we get the following result.
Proposition
If a Markov chain with a finite state space is irreducible and aperiodic,
then, irrespective of the initial distribution
,
where
is the unique stationary distribution of the chain.
So, even if the initial distribution of the chain is not the stationary
distribution, the terms
of the sequence become less and less dependent on the initial value
as
increases and their distributions converge to the stationary distribution.
Irreducibility can also be used to prove a Strong Law of Large Numbers.
Proposition
If a Markov chain
with a finite state space
is irreducible, then, for any bounded function
,
where
is the unique stationary distribution of the chain and
denotes almost sure
convergence as
tends to infinity.
Thus, when the state space is finite, irreducibility guarantees that sample averages taken across the chain converge to population averages taken across the states.
This kind of proposition is often called an ergodic theorem.
We now tackle the case in which the state space
is infinite but countable.
In particular, we assume
thatthat
is, the elements of the state space can be arranged into a sequence
.
The initial distribution of the chain is a sequence of
initial probabilities
such that
Then, we can choose a double sequence of transition
probabilities
such
that, for any index
,
and
we
impose
for
all
,
and
.
Note that the transition probabilities
are independent of
.
This property is called time-homogeneity.
These two choices (of the initial distribution
and the transition probabilities
)
completely determine the distribution of all the terms of the chain.
The distribution of
is a sequence
whose elements can be derived as follows:
The distributions of the other terms of the chain
are sequences
that can be derived recursively as
follows:
The concept of stationary distribution is almost identical to that introduced in the case of a finite state space.
If, for a given double sequence of transition probabilities
,
there is an initial distribution
such that the distribution of all the terms of the chain is equal to the
initial distribution, then
is called a stationary distribution of the chain.
Furthermore, we have
that
A Markov chain with countable state space is said to satisfy the
detailed balance condition if and only if there exists a
distribution
such
that
for
any
.
This implies
thator
But
Therefore,for
any
.
So,
is a stationary distribution of the chain.
The concept of irreducibility is the same found in the case of a finite state space.
When dealing with countable state spaces, we usually strengthen the concept of recurrence introduced for the case of a finite state space.
Remember that a state
is called recurrent if and only if the probability that the chain will return
to
(after having started from
itself) is equal to
.
Even if a state is recurrent, it could happen
thatthat
is, the expected value of the return time is infinite.
A state is said to be positive recurrent if and only if the latter possibility
is ruled out, that is, if and only
if
A Markov chain is called positive recurrent if and only if all the elements of its state space are positive recurrent.
The concept of aperiodicity is identical to that found in the case of a finite state space.
The following important result holds.
Proposition If a Markov chain with a countable state space is irreducible and positive recurrent, then it has a unique stationary distribution.
Note that in the case of a finite state space irreducibility was sufficient to obtain the uniqueness of the stationary distribution. In the countable case, we need to add the requirement of positive recurrence.
The following convergence result holds for chains having a countable state space.
Proposition
If a Markov chain with a countable state space is irreducible, positive
recurrent and aperiodic, then, irrespective of the initial distribution
,
for
any
,
where
is the unique stationary distribution of the chain.
Compare this proposition to the one for finite state spaces:
finite state space + irreducibility + aperiodicity
convergence to the stationary distribution;
countable state space + irreducibility + aperiodicity + positive recurrence
convergence to the stationary distribution.
Positive recurrence is needed also to prove a Strong Law of Large Numbers.
Proposition
If a Markov chain
with a countable state space
is irreducible and positive recurrent,
then
where
is any function such that the expected value
exists and is finite,
is the unique stationary distribution of the chain and
denotes almost sure convergence as
tends to infinity.
Often, there are cases in which positive recurrence is difficult to prove, but we know that the chain has a stationary distribution. In these cases, we can use the following proposition.
Proposition
If a Markov chain
with a countable state space
is irreducible and has a unique stationary distribution
,
then
for
any function
such that the expected value
exists and is finite.
We now analyze the more difficult case in which the state space
is infinite and uncountable.
The initial distribution of the chain is a probability
measure
such that
for
any event
.
Then, we can choose a function
called transition kernel and we
impose
for
all
and all events
.
The transition kernel does not depend on
.
It is time-homogeneous.
These two choices (of the initial distribution
and the transition kernel
)
completely determine the distribution of all the terms of the chain.
The distribution of
can be derived as follows:
where
is any event and the integral is a
Lebesgue
integral with respect to the probability measure
.
The distributions of the other terms of the chain
can be derived recursively as
follows:
When the terms of the sequence are continuous variables (or vectors), then
has a probability density
such
that
and
the transition kernel can be expressed in terms of a conditional probability
density
:
As a consequence, the marginal densities
of the terms of the chain can be expressed
as
The concept of stationary distribution is similar to that found above for the cases of a finite and a countable state space.
If, for a given transition kernel
,
there is an initial distribution
such that the distribution of all the terms of the chain is equal to the
initial distribution, then
is called a stationary distribution of the chain.
Furthermore, we have that
satisfies
for
any
.
In the case in which the transition kernel and the stationary distribution
have densities, we can
write
A Markov chain with uncountable state space and transition kernel is said to
satisfy the detailed balance condition if and only if there
exists a probability measure
such
that
If the measure
and the transition kernel
can be written in terms of probability densities, then the detailed balance
condition can be written
as
By integrating both sides of the equation with respect to
,
we
get
or
Sincewe
get
Thus,
is a stationary distribution.
The concept of irreducibility can be generalized to the case of an uncountable state space.
Let
and
be an event. Define the hitting time
where
the subscript
indicates that the chain is assumed to start from
.
We say that
leads to
if and only if
Again, the notation
means that the probability is computed under the assumption that
.
A Markov chain is said to be
-irreducible
if and only if there is a measure
such that every state
leads to
when
.
In other words, a chain is said to be
-irreducible
if and only if there is a positive probability that for any starting state
the chain will reach any set
having positive measure in finite time.
When dealing with uncountable state spaces, we often use a concept of recurrence, called Harris recurrence, that is even stronger than the concept of positive recurrence introduced in the case of an uncountable state space.
An event
is called Harris recurrent if and only if the probability
that the chain will return to
infinitely often (after having started from a point belonging to
)
is equal to
.
A Markov chain is called Harris recurrent if and only if it
is
-irreducible
and all the events
such that
are Harris recurrent.
In the uncountable case, the definition of aperiodicity is slightly more complicated.
A Markov chain is said to have period
if its state space can be partitioned into at most
mutually exclusive events such that the chain always takes exactly
periods to cycle through these events.
In symbols, if the events are
,...,
,
then you have
A chain is said to be aperiodic if its period is
.
Unlike in the finite and countable cases,
-irreducibility
and Harris recurrence are not sufficient to guarantee the existence and
uniqueness of the stationary distribution. Further technical conditions need
to be met (see, e.g., Glynn 2013).
Because
-irreducibility
and Harris recurrence are not sufficient to guarantee the existence and
uniqueness of the stationary distribution, the latter is often directly added
among the sufficient conditions for the convergence of the chain to a
stationary distribution.
Proposition
If a Markov chain with an uncountable state space is
-irreducible,
Harris recurrent, aperiodic and has a stationary distribution
,
then
converges to
.
The convergence of the sequence of probability distributions
to the stationary distribution
is in total variation
norm (a technical detail that we can safely skip here).
In the uncountable case, the following ergodic theorem holds.
Proposition
If a Markov chain
with an uncountable state space
is
-irreducible,
Harris recurrent and has a stationary distribution
,
then
for
any function
such that the expected value
exists and is finite.
Gilks, W. R., Richardson, S., Spiegelhalter D. (1995) Markov Chain Monte Carlo in Practice, CRC Press.
Glynn, P. W. (2013) Harris recurrence, Stochastic Systems lecture notes, Stanford University.
Hoel, P. G., Port, S. C., Stone, C. J. (1986) Introduction to Stochastic Processes, Waveland Press.
Norris, J. R. (1998) Markov Chains, Cambridge University Press.
Pishro-Nik, H. (2014) Introduction to Probability, Statistics, and Random Processes, Kappa Research, LLC.
Roberts, G. O., Rosenthal, J. S. (2006) Harris recurrence of Metropolis-within-Gibbs and trans-dimensional Markov Chains, The Annals of Applied Probability, Vol. 16, No. 4, 2123-2139.
Please cite as:
Taboga, Marco (2021). "Markov chain", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Markov-chains.
Most of the learning materials found on this website are now available in a traditional textbook format.