This lecture shows how to derive confidence intervals for the mean of a normal distribution.
We tackle two different cases:
when the variance of the distribution is known;
when the variance is unknown.
In each case we derive the level of confidence and we discuss how it is set.
We conclude with two solved exercises.
The theory needed to fully understand the derivations can be found in the lecture on interval estimation.
We start from the simpler case in which the variance is known.
We observe the realizations of
independent random variables
,
...,
,
all having a normal distribution with
unknown mean
;
known variance
.
To construct a confidence interval for the mean
,
we use the sample
mean
The confidence interval
iswhere
is a strictly positive constant.
We explain below how
is chosen.
The coverage probability is the probability that the confidence interval will
include the true mean
.
The coverage probability of
is
where
is a standard normal random variable.
The coverage probability can be written
aswhere
we have
defined
Given the assumptions made above, the sample mean
has a normal distribution with mean
and variance
,
as demonstrated in the lecture on Point
estimation of the mean. If we de-mean a normal random variable and we
dividing it by the square root of its variance, we obtain a standard normal
random variable. Therefore, the variable
has a standard normal distribution.
The coverage probability does not depend on the unknown parameter
.
Therefore, the level of confidence
coincides with the coverage
probability:where
is a standard normal random variable.
The level of confidence is chosen by the statistician, who adjusts the
constant
accordingly.
If the level of confidence is set equal to
,
then
where
is the cumulative distribution
function of a standard normal random variable.
The level of confidence can be written
aswhere
we have used the fact
that
by
the symmetry of the standard normal distribution around
.
Therefore,
We now relax the assumption that the variance of the distribution is known.
We observe the realizations of
independent random variables
,
...,
,
all having a normal distribution with
unknown mean
;
unknown variance
.
To construct a confidence interval for the mean
,
we use the sample
mean
and
the adjusted sample
variance
The confidence interval for the mean
is:where
is a strictly positive constant.
The coverage probability of the confidence interval
iswhere
is a standard Student's t random variable with
degrees of freedom.
The coverage probability can be written
aswhere
we have
defined
Now,
rewrite
as
where
we have
defined
Given
the assumptions made above, the adjusted sample variance
has a Gamma distribution with parameters
and
,
as demonstrated in the lecture on Point
estimation of variance. Therefore, the random variable
has a Gamma distribution with parameters
and
.
Moreover, the random variable
has a standard normal distribution (see the previous section). Hence,
is the ratio between a standard normal random variable and the square root of
a Gamma random variable with parameters
and
.
As a consequence,
has a standard Student's t distribution with
degrees of freedom (see the lecture on the Student's t
distribution for a proof of this fact).
The coverage probability does not depend on the unknown parameters
and
.
Therefore, the level of confidence coincides with the coverage
probability:where
has a standard Student's t distribution with
degrees of freedom.
As before, the constant
is adjusted so as to achieve the desired level of confidence.
If the latter is equal to
,
then
where
is the cumulative distribution
function of a standard Student's t random variable with
degrees of freedom.
The proof is identical to that we have shown
above for the case of known variance. In fact, also the t distribution is
symmetric around
.
Below you can find some exercises with explained solutions.
Suppose that you observe a sample of 100 independent draws from a normal
distribution having unknown mean
and known variance
.
Denote the draws by
,
...,
.
Their sample mean
is
Find a confidence interval for
having
coverage probability.
For a given sample size
,
the interval
estimator
has
coverage
probability
where
is a standard normal random variable and
is a strictly positive constant. Thus, we need to find
such
that
But
where
the last equality stems from the fact that the standard normal distribution is
symmetric around zero. Therefore
must be such
that
or
Using
normal distribution tables or a computer program to find the value of
(see the lecture entitled Normal
distribution - Values), we
obtain
Thus,
the confidence interval for
is
Suppose you observe a sample of 100 independent draws from a normal
distribution having unknown mean
and unknown variance
.
Denote the draws by
,
...,
.
The sample mean
is
The adjusted sample variance
is
Set the level of confidence at 99% and find a confidence interval for the mean
.
For a given sample size
,
the interval
estimator
has
coverage
probability
where
is a standard Student's t random variable with
degrees of freedom and
is a strictly positive constant. Thus, we need to find
such
that
But
where
the last equality stems from the fact that the standard Student's t
distribution is symmetric around zero. Therefore
must be such
that
or:
Using
a computer program to find the value of
(for example, with the MATLAB command
tinv(0.995,99)
),
we
obtainThus,
the confidence interval for
is
Please cite as:
Taboga, Marco (2021). "Confidence interval for the mean", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/set-estimation-mean.
Most of the learning materials found on this website are now available in a traditional textbook format.