The Gauss Markov theorem says that, under certain conditions, the ordinary least squares (OLS) estimator of the coefficients of a linear regression model is the best linear unbiased estimator (BLUE), that is, the estimator that has the smallest variance among those that are unbiased and linear in the observed output variables.
The regression model iswhere:
is an vector of observations of the output variable ( is the sample size);
is an matrix of inputs ( is the number of inputs for each observation);
is a vector of regression coefficients;
is an vector of errors.
The OLS estimator of is
We assume that:
has full-rank (as a consequence, is invertible, and is well-defined);
;
, where is the identity matrix and is a positive constant.
First of all, note that is linear in . In fact, is the product between the matrix and , and matrix multiplication is a linear operation.
It can easily be proved that is unbiased, both conditional on , and unconditionally, that is,
We can use the definition of to re-write the OLS estimator as follows:When we condition on , we can treat as a constant matrix. Therefore, the conditional expectation of is The Law of Iterated Expectations implies that
Now that we have shown that the OLS estimator is linear and unbiased, we need to prove that it is also the best linear unbiased estimator.
What exactly do we mean by best?
When is a scalar (i.e., there is only one regressor), we consider to be the best among those we are considering (i.e., among all the linear unbiased estimators) if and only if it has the smallest possible variance, that is, if its deviations from the true value tend to be the smallest on average. Thus, is the best linear unbiased estimator (BLUE) if and only if for any other linear unbiased estimator .
Since we often deal with more than one regressor, we have to extend this definition to a multivariate context. We do this by requiring thatfor any constant vector , any other linear unbiased estimator .
In other words, OLS is BLUE if and only if any linear combination of the regression coefficients is estimated more precisely by OLS than by any other linear unbiased estimator.
Condition (1) is satisfied if and only if is a positive semi-definite matrix.
We can write condition (1) asor But the latter inequality is true if and only if is positive-semidefinite (by the very definition of positive-semidefinite matrix).
In the next two sections we will derive (the covariance matrix of the OLS estimator), and then we will prove that (2) is positive-semidefinite, so that OLS is BLUE.
The conditional covariance matrix of the OLS estimator is
We have already proved (see above) that the OLS estimator can be written asTherefore, its conditional variance is
Since we are considering the set of linear estimators, we can write any estimator in this set aswhere is a matrix.
Furthermore, if we definethen we can write
It is possible to prove that if is unbiased.
We have thatAs a consequence, is always equal to only if .
By using this result, we can also prove that
The proof is as follows:where in steps , and we have used the fact that .
As a consequence,is positive semi-definite because is positive semi-definite. This is true for any unbiased linear estimator . Therefore, the OLS estimator is BLUE.
Please cite as:
Taboga, Marco (2021). "Gauss Markov theorem", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/Gauss-Markov-theorem.
Most of the learning materials found on this website are now available in a traditional textbook format.