Gauss Markov theorem

The Gauss Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE), that is, the estimator that has the smallest variance among those that are unbiased and linear in the observed output variables.

Table of contents

Assumptions
OLS is linear and unbiased
What it means to be best
The covariance matrix of the OLS estimator
OLS is BLUE

Assumptions

The regression model iswhere:

is an vector of observations of the output variable ( is the sample size);
is an matrix of inputs ( is the number of inputs for each observation);
is a vector of regression coefficients;
is an vector of errors.

The OLS estimator of is

We assume that:

has full-rank (as a consequence, $X^{ op }X$ is invertible, and is well-defined);
;
, where is the identity matrix and is a positive constant.

OLS is linear and unbiased

First of all, note that is linear in . In fact, is the product between the matrix and , and matrix multiplication is a linear operation.

It can easily be proved that is unbiased, both conditional on , and unconditionally, that is, [eq6]

Proof

We can use the definition of to re-write the OLS estimator as follows: [eq7] When we condition on , we can treat as a constant matrix. Therefore, the conditional expectation of is [eq8] The Law of Iterated Expectations implies that

What it means to be best

Now that we have shown that the OLS estimator is linear and unbiased, we need to prove that it is also the best linear unbiased estimator.

What exactly do we mean by best?

When is a scalar (i.e., there is only one regressor), we consider to be the best among those we are considering (i.e., among all the linear unbiased estimators) if and only if it has the smallest possible variance, that is, if its deviations from the true value tend to be the smallest on average. Thus, is the best linear unbiased estimator (BLUE) if and only if for any other linear unbiased estimator .

Since we often deal with more than one regressor, we have to extend this definition to a multivariate context. We do this by requiring thatfor any constant vector , any other linear unbiased estimator .

In other words, OLS is BLUE if and only if any linear combination of the regression coefficients is estimated more precisely by OLS than by any other linear unbiased estimator.

Condition (1) is satisfied if and only if is a positive semi-definite matrix.

Proof

We can write condition (1) asor But the latter inequality is true if and only if is positive-semidefinite (by the very definition of positive-semidefinite matrix).

In the next two sections we will derive (the covariance matrix of the OLS estimator), and then we will prove that (2) is positive-semidefinite, so that OLS is BLUE.

The covariance matrix of the OLS estimator

The conditional covariance matrix of the OLS estimator is

Proof

We have already proved (see above) that the OLS estimator can be written asTherefore, its conditional variance is [eq19]

OLS is BLUE

Since we are considering the set of linear estimators, we can write any estimator in this set aswhere is a matrix.

Furthermore, if we definethen we can write [eq22]

It is possible to prove that if is unbiased.

Proof

We have that [eq23] As a consequence, is always equal to only if .

By using this result, we can also prove that

Proof

The proof is as follows: [eq26] where in steps , and we have used the fact that .

As a consequence,is positive semi-definite because is positive semi-definite. This is true for any unbiased linear estimator . Therefore, the OLS estimator is BLUE.

How to cite

Please cite as:

Taboga, Marco (2021). "Gauss Markov theorem", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Gauss-Markov-theorem.