StatlectThe Digital Textbook
Index > Fundamentals of statistics

The normal linear regression model

This lecture discusses the main properties of the Normal Linear Regression Model (NLRM), a linear regression model in which the vector of errors of the regression is assumed to have a multivariate normal distribution conditional on the matrix of regressors. The assumption of multivariate normality, together with other assumptions (mainly concerning the covariance matrix of the errors), allows to derive analytically the distributions of the Ordinary Least Squares (OLS) estimators of the regression coefficients and of several other statistics.

Setting

We use the same notation used in the lecture entitled Properties of the OLS estimator (to which you can refer for more details): the $N	imes 1$ vector of observations of the dependent variable is denoted by $y$, the $N	imes K$ matrix of regressors (called design matrix) is denoted by X, the $N	imes 1 $ vector of errors is denoted by epsilon and the Kx1 vector of regression coefficients is denoted by $eta $, so that the regression equations can be written in matrix form as[eq1]The OLS estimator $widehat{eta }$ is the vector which minimizes the sum of squared residuals[eq2]and, if the design matrix X has full rank, it can be computed as[eq3]

Assumptions

The assumptions made in a normal linear regression model are:

  1. the design matrix X has full-rank (as a consequence, $X^{	op }X$ is invertible and the OLS estimator is [eq4]);

  2. conditional on X, the vector of errors epsilon has a multivariate normal distribution with mean equal to 0 and covariance matrix equal to[eq5]where sigma^2 is a positive constant and I is the $N	imes N$ identity matrix;

Note that the assumption that the covariance matrix of epsilon is diagonal implies that the entries of epsilon are mutually independent, that is, $arepsilon _{i}$ is independent of $arepsilon _{j}$ for $i
eq j$. Moreover, the assumption that all diagonal entries of the covariance matrix are equal implies that all the entries of epsilon have the same variance, that is, [eq6] for any i. The latter assumption is often referred to as "homoscedasticity assumption", and if the assumption is satisfied, we say that the errors are homoscedastic. On the contrary, if homoscedasticity does not hold, we say that the errors are heteroscedastic.

Distribution of the OLS estimator

Under the assumptions made in the previous section, the OLS estimator has a multivariate normal distribution, conditional on the design matrix.

Proposition In a Normal Linear Regression Model, the OLS estimator $widehat{eta }$ has a multivariate normal distribution, conditional on X, with mean [eq7]and covariance matrix[eq8]

Proof

First of all, note that[eq9]The fact that we are conditioning on X means that we can treat X as a constant matrix. Therefore, conditional on X, the OLS estimator $widehat{eta }$ is a linear transformation of a multivariate normal random vector (the vector epsilon). This implies that also $widehat{eta }$ is multivariate normal, with mean[eq10]and variance[eq11]

Note that [eq12] means that the OLS estimator is unbiased, not only conditionally, but also unconditionally, because by the Law of Iterated Expectations we have that[eq13]

Estimation of the variance of the error terms

The variance of the error terms sigma^2 is usually not known. A commonly used estimator of sigma^2 is the adjusted sample variance of the residuals:[eq14]where the regression residuals are [eq15]

The proposed estimator is a conditionally unbiased estimator of sigma^2:[eq16]

Proof

Denote by $e$ the $N	imes 1$ vector of residuals. Remember from the previous proof that the OLS estimator can be written as[eq17]As a consequence, we have[eq18]The matrix[eq19]is clearly symmetric (verify it by taking its transpose). It is also idempotent because[eq20]Therefore,[eq21]where [eq22] has a standard multivariate normal distribution, that is, a multivariate normal distribution with zero mean and unit covariance matrix. Since the matrix $M$ is symmetric and idempotent, the quadratic form [eq23]has a Chi-square distribution with a number of degrees of freedom equal to the trace of the matrix $M$ (see the lecture Normal distribution - Quadratic forms). But the trace of $M$ is[eq24]By using the fact that the expected value of a Chi-square random variable is equal to its number of degrees of freedom, we have[eq25]

Note that also in this case, the proposed estimator is unbiased not only conditionally, but also unconditionally because, by the Law of Iterated Expectations, we have that[eq26]

Estimation of the covariance matrix of the OLS estimator

We have already proved that in the Normal Linear Regression Model the conditional covariance matrix of the OLS estimator (conditional on X) is[eq8]

In practice, however, this quantity is not known exactly because the variance of the error terms, that is sigma^2, is unknown. However, we can replace its unknown value with the estimator proposed above (the adjusted sample variance of the residuals), so as to obtain an estimator of the covariance matrix of $widehat{eta }$:[eq28]

This estimator is often employed to construct test statistics that allow to conduct tests of hypotheses about the regression coefficients.

The book

Most learning materials found on this website are now available in a traditional textbook format.