StatlectThe Digital Textbook
Index > Fundamentals of statistics

Properties of the OLS estimator

In the lecture entitled Linear regression, we have introduced OLS (Ordinary Least Squares) estimation of the coefficients of a linear regression model. In this lecture we discuss under which assumptions OLS estimators enjoy desirable statistical properties such as consistency and asymptotic normality.

Setting

Consider the linear regression model[eq1]

where the outputs are denoted by $y_{i}$, the associated $1	imes K$ vectors of inputs are denoted by $x_{i}$, the Kx1 vector of regression coefficients is denoted by $eta $ and $arepsilon _{i}$ are unobservable error terms.

We assume to observe a sample of $N$ realizations, so that the vector of all outputs

[eq2]is an $N	imes 1$ vector, the design matrix[eq3]is an $N	imes K$ matrix, and the vector of error terms[eq4]is an $N	imes 1$ vector.

The OLS estimator $widehat{eta }$ is the vector of regression coefficients that minimizes the sum of squared residuals:[eq5]

As proved in the lecture entitled Linear regression, if the design matrix X has full rank, then the OLS estimator is computed as follows:[eq6]

Consistency

In this section we are going to propose a set of conditions that are sufficient for the consistency of OLS estimators.

Note that the OLS estimator can be written as [eq7]where [eq8]is the sample mean of the $K	imes K$ matrix $x_{i}^{	op }x_{i}$ and [eq9]is the sample mean of the Kx1 matrix $x_{i}^{	op }y_{i}$.

The first assumption we make is that these sample means converge to their population counterparts, which is formalized as follows.

Assumption 1 (convergence): both the sequence [eq10] and the sequence [eq11] satisfy sets of conditions that are sufficient for the convergence in probability of their sample means to the population means [eq12] and [eq13], which do not depend on i.

For example, the sequences [eq14] and [eq15] could be assumed to satisfy the conditions of Chebyshev's Weak Law of Large Numbers for correlated sequences, which are quite mild (basically, it is only required that the sequences are covariance stationary and that their auto-covariances are zero on average).

The second assumption we make is a rank assumption (sometimes also called identification assumption).

Assumption 2 (rank): the square matrix [eq16] has full rank (as a consequence, it is invertible).

The third assumption we make is that the regressors $x_{i}$ are orthogonal to the error terms $arepsilon _{i}$.

Assumption 3 (orthogonality): For each i, $x_{i}$ and $arepsilon _{i}$ are orthogonal, that is,[eq17]

It is then straightforward to prove the following proposition.

Proposition If Assumptions 1, 2 and 3 are satisfied, then the OLS estimator $widehat{eta }$ is a consistent estimator of $eta $.

Proof

Let us make explicit the dependence of the estimator on the sample size and denote by [eq18] the OLS estimator obtained when the sample size is equal to $N.$ By Assumption 1 and by the Continuous Mapping theorem, we have that the probability limit of [eq19] is [eq20]Now, if we pre-multiply the regression equation[eq1]by $x_{i}^{	op }$ and we take expected values, we get[eq22]But by Assumption 3, it becomes[eq23]or[eq24]which implies that[eq25]

Asymptotic normality

In this section we are going to discuss a condition that, together with Assumptions 1-3 above, is sufficient for the asymptotic normality of OLS estimators.

The condition is as follows.

Assumption 4 (Central Limit Theorem): the sequence [eq26] satisfies a set of conditions that are sufficient to guarantee that a Central Limit Theorem applies to its sample mean[eq27]

For a review of some of the conditions that can be imposed on a sequence to guarantee that a Central Limit Theorem applies to its sample mean, you can go to the lecture entitled Central Limit Theorem. In any case, remember that if a Central Limit Theorem applies to [eq26], then, as $N$ tends to infinity,[eq29] converges in distribution to a multivariate normal distribution with mean equal to 0 and covariance matrix equal to[eq30]

With Assumption 4 in place, we are now able to prove the asymptotic normality of the OLS estimators.

Proposition If Assumptions 1, 2, 3 and 4 are satisfied, then the OLS estimator $widehat{eta }$ is asymptotically multivariate normal with mean equal to $eta $ and asymptotic covariance matrix equal to[eq31]that is,[eq32]where V has been defined above.

Proof

As in the proof of consistency, the dependence of the estimator on the sample size is made explicit, so that the OLS estimator is denoted by [eq33]. First of all, we have [eq34]where, in the last step, we have used the fact that, by Assumption 3, [eq35]. Note that, by Assumption 1 and the Continuous Mapping theorem, we have[eq36]Furthermore, by Assumption 4, we have that[eq37]converges in distribution to a multivariate normal random vector having mean equal to 0 and covariance matrix equal to V. Thus, by Slutski's theorem, we have that[eq38]converges in distribution to a multivariate normal vector with mean equal to 0 and covariance matrix equal to [eq39]

Estimation of the variance of the error terms

We now make a further assumption.

Assumption 5: the sequence [eq40] satisfies a set of conditions that are sufficient for the convergence in probability of its sample mean[eq41]to the population mean [eq42]which does not depend on i.

If this assumption is satisfied, then the variance of the error terms sigma^2 can be estimated by the sample variance of the residuals[eq43]where [eq44]

Proposition Under Assumptions 1, 2, 3, and 5, it can be proved that [eq45] is a consistent estimator of sigma^2.

Proof

Let us make explicit the dependence of the estimators on the sample size and denote by [eq46] and [eq47] the estimators obtained when the sample size is equal to $N.$ By Assumption 1 and by the Continuous Mapping theorem, we have that the probability limit of [eq47] is [eq49]where: in steps $rame{A}$ and $rame{C}$ we have used the Continuous Mapping Theorem; in step $rame{B}$ we have used Assumption 5; in step $rame{D}$ we have used the fact that [eq50]because [eq51] is a consistent estimator of $eta $, as proved above.

Estimation of the asymptotic covariance matrix

We have proved that the asymptotic covariance matrix of the OLS estimator is[eq52]where the long-run covariance matrix V is defined by[eq53]

Usually, the matrix [eq54] needs to be estimated because it depends on quantities (V and [eq16]) that are not known. The next proposition characterizes consistent estimators of [eq56].

Proposition If Assumptions 1, 2, 3, 4 and 5 are satisfied, and a consistent estimator $widehat{V}$ of the long-run covariance matrix V is available, then the asymptotic variance of the OLS estimator is consistently estimated by[eq57]

Proof

This is proved as follows[eq58]where: in step $box{A}$ we have used the Continuous Mapping theorem; in step $box{B}$ we have used the hypothesis that $widehat{V}$ is a consistent estimator of the long-run covariance matrix V and the fact that, by Assumption 1, the sample mean of the matrix $x_{i}^{	op }x_{i}$ is a consistent estimator of [eq59], that is$QTR{rm}{,}$[eq60]

Thus, in order to derive a consistent estimator of the covariance matrix of the OLS estimator, we need to find a consistent estimator of the long-run covariance matrix V. How to do this is discussed in the next section.

Estimation of the long-run covariance matrix

The estimation of V requires some assumptions on the covariances between the terms of the sequence [eq61].

Before providing some examples of such assumptions, we need the following fact.

Proposition Under Assumptions 3 and 4, the long-run covariance matrix V satisfies[eq62]

Proof

This is proved as follows:[eq63]

We start with a restrictive assumption.

Assumption 6: $arepsilon _{i}$ is orthogonal to $arepsilon _{j}$ for any $i
eq j$, and [eq64] is uncorrelated with $x_{i}^{	op }x_{j}$ for any i and $j$.

This assumption has the following implication.

Proposition If Assumptions 1, 2, 3, 4, 5 and 6 are satisfied, then the long-run covariance matrix V is consistently estimated by[eq65]

Proof

First of all, we have that[eq66]But we know that, by Assumption 1, [eq67] is consistently estimated by[eq68]and by Assumptions 1, 2, 3 and 5, [eq69] is consistently estimated by[eq70]Therefore, by the Continuous Mapping theorem, the long-run covariance matrix V is consistently estimated by [eq71]

Note that in this case the asymptotic covariance matrix of the OLS estimator is[eq72]

As a consequence, the covariance of the OLS estimator can be approximated by[eq73]which is the same estimator derived in the normal linear regression model.

We now consider an assumption which is weaker than Assumption 6.

Assumption 6b: [eq74] is uncorrelated with [eq75] for any $i
eq j$. Furthermore, [eq76] does not depend on i and is consistently estimated by its sample mean[eq77]

This assumption has the following implication.

Proposition If Assumptions 1, 2, 3, 4, 5 and 6b are satisfied, then the long-run covariance matrix V is consistently estimated by[eq78]

Proof

First of all, we have that[eq79]Furthermore,[eq80]where in the last step we have applied the Continuous Mapping theorem separately to each entry of the matrices in square brackets, together with the fact that [eq81]To see how this is done, consider, for example, the matrix[eq82]Then, the entry at the intersection of its k-th row and $l$-th column is[eq83]and[eq84]

The assumptions above can be made even weaker (for example, by relaxing the hypothesis that [eq74] is uncorrelated with [eq86]), at the cost of facing more difficulties in estimating the long-run covariance matrix. For a review of the methods that can be used to estimate $widehat{V}$, see, for example, Den and Levin (1996).

References

Haan, Wouter J. Den, and Andrew T. Levin (1996). "Inferences from parametric and non-parametric covariance matrix estimation procedures." Technical Working Paper Series, NBER.

The book

Most learning materials found on this website are now available in a traditional textbook format.