Mean estimation is a statistical inference problem in which a sample is used to produce a point estimate of the mean of an unknown distribution.
The problem is typically solved by using the sample mean as an estimator of the population mean.
In this lecture, we present two examples, concerning:
normal IID samples;
IID samples that are not necessarily normal.
For each of these two cases, we derive the expected value, the variance and the asymptotic properties of the mean estimator.
Table of contents
In this example of mean estimation, which is probably the most important in the history of statistics, the sample is drawn from a normal distribution.
Specifically, we observe the realizations of
independent random variables
,
...,
,
all having a normal distribution with unknown mean
and variance
.
As an estimator of the mean
,
we use the sample
mean
The expected value of the estimator
is equal to the true mean
.
This can be proved by using the linearity of the expected
value:
Therefore, the estimator
is unbiased.
The variance of the estimator
is equal to
.
This can be proved by using the formula for the variance of an independent
sum:
Therefore, the variance of the estimator tends to zero as the sample size
tends to infinity.
The estimator
has a normal
distribution:
Note that the sample mean
is a linear combination of the normal and independent random variables
(all the coefficients of the linear combination are equal to
).
Therefore,
is normal because a
linear combination of
independent normal random variables is normal. The mean and the variance
of the distribution have already been derived
above.
The mean squared error of the
estimator
is
The sequence
is an IID sequence with finite mean.
Therefore, it satisfies the conditions of Kolmogorov's Strong Law of Large Numbers.
Hence, the sample mean
converges almost surely to the true mean
:
that
is, the estimator
is strongly consistent.
The estimator is also weakly
consistent because almost sure convergence implies
convergence in
probability:
In this example of mean estimation, we relax the previously made assumption of normality.
The sample is made of the realizations of
independent random variables
,
...,
,
all having the same distribution with mean
and variance
.
Again, the estimator of the mean
is the sample
mean:
The expected value of the estimator
is equal to the true
mean:
Therefore, the estimator is unbiased.
The proof is the same found in the previous example.
The variance of the estimator
is
Also in this case the proof is the same found in the previous example.
Unlike in the previous example, the estimator
does not necessarily have a normal distribution: its distribution depends on
those of the terms of the sequence
.
However, we will see below that
has a normal distribution asymptotically, that is, it converges to a normal
random variable when
becomes large.
The mean squared error of the estimator
is
The proof is the same found in the previous example.
Since the sequence
is an IID sequence whose terms have finite mean, it satisfies the conditions
of Kolmogorov's Strong Law of Large Numbers.
Therefore, the estimator
is both strongly consistent and weakly consistent (see example above).
The sequence
is an IID sequence with finite mean and variance.
Therefore, it satisfies the conditions of Lindeberg-Lévy Central Limit Theorem.
Hence, the sample mean
is asymptotically normal:
where
is a standard normal random variable and
denotes convergence in distribution.
In other words, the sample mean
converges in distribution to a normal random
variable with mean
and variance
.
Below you can find some exercises with explained solutions.
Consider an experiment that can have only two outcomes: either success, with
probability
,
or failure, with probability
.
The probability of success is unknown, but we know
that
Suppose that we can independently repeat the experiment as many times as we
wish and use the
ratioas
an estimator of
.
What is the minimum number of experiments needed in order to be sure that the standard deviation of the estimator is less than 1/100?
Denote by
the
estimator of
.
It can be written
as
where
is the number of repetitions of the experiment and
are
independent random variables having a
Bernoulli distribution with
parameter
.
Therefore,
is the sample mean of
independent Bernoulli random variables with expected value
and
Thus
We
need to ensure
that
or
which
is certainly verified
if
or
Suppose that you observe a sample of 100 independent draws from a distribution
having unknown mean
and known variance
.
How can you approximate the distribution of their sample mean?
We can approximate the distribution of
the sample mean with its asymptotic distribution. So, the distribution of the
sample mean can be approximated by a normal distribution with mean
and variance
Please cite as:
Taboga, Marco (2021). "Estimation of the mean", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/mean-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.