The Gauss Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE), that is, the estimator that has the smallest variance among those that are unbiased and linear in the observed output variables.
The regression model
iswhere:
is an
vector of observations of the output variable
(
is the sample size);
is an
matrix of inputs
(
is the number of inputs for each observation);
is a
vector of regression coefficients;
is an
vector of errors.
The OLS
estimator of
is
We assume that:
has full-rank (as a consequence,
is invertible, and
is well-defined);
;
,
where
is the
identity matrix and
is a positive constant.
First of all, note that
is linear in
.
In fact,
is the product between the
matrix
and
,
and matrix multiplication is a linear operation.
It can easily be proved that
is unbiased, both conditional on
,
and unconditionally, that
is,
We can use the definition of
to re-write the OLS estimator as
follows:
When
we condition on
,
we can treat
as a constant matrix. Therefore, the
conditional
expectation of
is
The
Law of Iterated Expectations implies
that
Now that we have shown that the OLS estimator is linear and unbiased, we need to prove that it is also the best linear unbiased estimator.
What exactly do we mean by best?
When
is a scalar (i.e., there is only one regressor), we consider
to be the best among those we are considering (i.e., among all the linear
unbiased estimators) if and only if it has the smallest possible variance,
that is, if its deviations from the true value
tend to be the smallest on average. Thus,
is the best linear unbiased estimator (BLUE) if and only if
for
any other linear unbiased estimator
.
Since we often deal with more than one regressor, we have to extend this
definition to a multivariate context. We do this by requiring
thatfor
any
constant vector
,
any other linear unbiased estimator
.
In other words, OLS is BLUE if and only if any linear combination of the regression coefficients is estimated more precisely by OLS than by any other linear unbiased estimator.
Condition (1) is satisfied if and only if
is
a positive semi-definite matrix.
We can write condition (1)
asor
But
the latter inequality is true if and only if
is positive-semidefinite (by the very definition of positive-semidefinite
matrix).
In the next two sections we will derive
(the covariance
matrix of the OLS estimator), and then we will prove that (2) is
positive-semidefinite, so that OLS is BLUE.
The conditional covariance matrix of the OLS estimator
is
We have already proved (see above) that the
OLS estimator can be written
asTherefore,
its conditional variance
is
Since we are considering the set of linear estimators, we can write any
estimator in this set
aswhere
is a
matrix.
Furthermore, if we
definethen
we can
write
It is possible to prove that
if
is unbiased.
We have
thatAs
a consequence,
is always equal to
only if
.
By using this result, we can also prove
that
The proof is as
follows:where
in steps
,
and
we have used the fact that
.
As a
consequence,is
positive semi-definite because
is positive semi-definite. This is true for any unbiased linear estimator
.
Therefore, the OLS estimator is BLUE.
Please cite as:
Taboga, Marco (2021). "Gauss Markov theorem", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Gauss-Markov-theorem.
Most of the learning materials found on this website are now available in a traditional textbook format.