The Monte Carlo method is a computational method that consists in using a computer-generated sample from a given probability distribution to produce a plug-in estimate of some feature of the given distribution (such as, for example, a moment or a quantile).
In what follows, we are often going to refer to the plug-in principle. If you are not familiar with the concept, you are advised to first read the lecture entitled Plug-in principle.
Let
be a random variable with
distribution function
and suppose that we can generate (through a computer) a sample
of
realizations of
random variables
,
...,
all having distribution function
.
Denote by
a
feature of the distribution
(e.g., its mean, its variance, or one of its quantiles).
Denote by
the empirical
distribution of the sample
(i.e., a probability distribution that assigns probability
to each one of the values
).
Then, the plug-in
estimateis
a Monte Carlo approximation of
.
In other words, the feature
of the original distribution is approximated by the same feature
of the empirical distribution of the computer-generated sample.
Example
Let
be a random variable having distribution function
and let
be a function. Suppose we want to approximate
where
we have written the expected value of
as a Riemann-Stieltjes integral (see the lecture entitled
Expected value) in
order to highlight the fact that it depends on the distribution function
.
Now, suppose we have a computer-generated sample of
draws
,
...,
from the distribution
.
Denote their empirical distribution function by
.
Then, the Monte Carlo approximation of
is
where
we have used the facts that the empirical distribution
is discrete and the expected value of a discrete random variable is the
weighted sum of the values it can take, with weights equal to their respective
probabilities (in this case the weights are all equal to
). In other words, we can approximate the expected value
with the sample mean of the computer-generated draws
.
When the Monte Carlo method is used to approximate an expected value (as in
the previous example), then the method is called Monte Carlo
integration. It consists in approximating the expected value of a
random variable
with the mean of a computer-generated sample of observations drawn at random
from the distribution of
:
where
,...,
are the computer-generated draws from the distribution of
.
If the draws are IID, then
Kolmogorov's Strong Law of Large Numbers applies and the approximation
converges almost surely to
as the number of draws
becomes large. It is also possible, however, that the draws are not IID (as,
for instance, in the so-called Markov Chain Monte Carlo methods), but the
approximation converges anyway to
because a Law of Large Numbers for correlated sequences applies (see the
lecture entitled Laws of Large Numbers for more
details).
This approximation method is called Monte Carlo integration because the
expected value being approximated is in fact an
integral:where
is the distribution function of
,
and the integral is a Riemann-Stieltjes integral. Moreover, if
is a continuous variable, the integral can be written as an ordinary Riemann
integral:
where
is the probability
density function of
.
It is very important to note that the method of Monte Carlo integration is
often used when the distribution function of
is not known analytically, but
can be written as a function
of a random vector
and it is easy to produce computer-generated draws of
.
In such a case,
draws
,...,
are generated from the distribution of
,
which are then transformed into
draws from the distribution of
:
and
the expected value of
is approximated
by
Monte Carlo integration is also used to compute integrals that have no
particular probabilistic interpretation but can be written as expected values.
Suppose the integral to be computed
iswhere
is any integrable function. Now, take any
legitimate
probability density function
such that
for
and
for
or
.
Then we can
write
where
is a random variable having probability density function
.
If we are able to produce a computer-generated sample of draws
,...,
from the distribution of
,
then the integral can be approximated as
follows:
Up to this point, we have not clarified what we mean by a computer-generated sample of draws from a given distribution. While we will not go into details, we would like to highlight some facts.
First of all, there exist several algorithms that allow us to use a computer to produce sequences of numbers, called pseudo-random numbers, which are not truly random, but approximate fairly well the properties of a truly random sequence. If a statistician observed sequences of numbers produced by these algorithms without knowing how they were produced, she would conclude that they are indeed random (even after carrying out rigorous statistical tests to check their randomness). See Wikipedia's article on pseudo-random number generators for more details.
Second, most available algorithms for producing sequences of pseudo-random
numbers produce sequences of
independent
draws from a uniform distribution on the interval
.
These sequences of uniform random numbers are then transformed in some way in
order to obtain sequences drawn from other distributions. For example, if
is a pseudo-random number having a uniform distribution on
and
is an invertible distribution function, then the
number
has
distribution function
.
We
have
This method of generating pseudo-random numbers from non-uniform distributions is called inverse transformation method. Lots of other methods exist (see, e.g., Wikipedia's article on non-uniform pseudo-random variate generation).
Third, all commonly used statistical software packages include efficient and thoroughly tested pseudo-random number generators for extracting random samples from the most important probability distributions. So, if you need samples of observations from these distributions to compute a Monte Carlo approximation, all you need to do is to find out how to use the pseudo-random number generator that comes with your favorite statistical software.
By approximating a quantity of interest
with its Monte Carlo approximation
,
we commit an approximation
error
In general, the properties of this approximation error (mean, variance,
asymptotic convergence) depend on the properties of the plug-in estimator
.
But, as noted in the lecture on the
plug-in principle, these
properties can be difficult to study in general terms.
However, things are relatively simple in the particular case
of Monte Carlo integration, that is, when
is an integral (an expected value). In this
case,
and
When
,...,
are independent and have the same distribution of
,
then
converges almost surely to
by Kolmogorov's Strong Law of Large Numbers (provided
).
But this implies that the approximation error converges to zero as the sample
size
gets large. Furthermore, the expected value and the variance of the
approximation error can be easily
computed:
The expected value
isand
the variance
is
From these formulae it is clear that the variance of the approximation error
can be made as small as desired by increasing the sample size
of the computer-generated sample (at the cost of increasing the computational
intensity of the approximation, that is, the computer time and memory required
to compute the approximation).
Note also that, since
is usually not known, it can be estimated by the
sample
variance
and
the variance of the approximation error is estimated
by
Finally, let us mention that there exist several techniques that can be used to reduce the variance of a Monte Carlo approximation. These techniques are called variance reduction techniques. One of the most popular of these techniques is introduced in the lecture entitled Importance sampling.
Please cite as:
Taboga, Marco (2021). "The Monte Carlo method", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/asymptotic-theory/Monte-Carlo-method.
Most of the learning materials found on this website are now available in a traditional textbook format.