This lecture discusses the main properties of the Normal Linear Regression Model (NLRM), a linear regression model in which the vector of errors of the regression is assumed to have a multivariate normal distribution conditional on the matrix of regressors. The assumption of multivariate normality, together with other assumptions (mainly concerning the covariance matrix of the errors), allows to derive analytically the distributions of the Ordinary Least Squares (OLS) estimators of the regression coefficients and of several other statistics.
We use the same notation used in the lecture entitled Properties of the OLS estimator (to which you can refer for more details): the vector of observations of the dependent variable is denoted by , the matrix of regressors (called design matrix) is denoted by , the vector of errors is denoted by and the vector of regression coefficients is denoted by , so that the regression equations can be written in matrix form asThe OLS estimator is the vector which minimizes the sum of squared residualsand, if the design matrix has full rank, it can be computed as
The assumptions made in a normal linear regression model are:
the design matrix has full-rank (as a consequence, is invertible and the OLS estimator is );
conditional on , the vector of errors has a multivariate normal distribution with mean equal to and covariance matrix equal towhere is a positive constant and is the identity matrix;
Note that the assumption that the covariance matrix of is diagonal implies that the entries of are mutually independent, that is, is independent of for . Moreover, the assumption that all diagonal entries of the covariance matrix are equal implies that all the entries of have the same variance, that is, for any . The latter assumption is often referred to as "homoscedasticity assumption", and if the assumption is satisfied, we say that the errors are homoscedastic. On the contrary, if homoscedasticity does not hold, we say that the errors are heteroscedastic.
Under the assumptions made in the previous section, the OLS estimator has a multivariate normal distribution, conditional on the design matrix.
Proposition In a Normal Linear Regression Model, the OLS estimator has a multivariate normal distribution, conditional on , with mean and covariance matrix
First of all, note thatThe fact that we are conditioning on means that we can treat as a constant matrix. Therefore, conditional on , the OLS estimator is a linear transformation of a multivariate normal random vector (the vector ). This implies that also is multivariate normal, with meanand variance
Note that means that the OLS estimator is unbiased, not only conditionally, but also unconditionally, because by the Law of Iterated Expectations we have that
The variance of the error terms is usually not known. A commonly used estimator of is the adjusted sample variance of the residuals:where the regression residuals are
The proposed estimator is a conditionally unbiased estimator of :
Denote by the vector of residuals. Remember from the previous proof that the OLS estimator can be written asAs a consequence, we haveThe matrixis clearly symmetric (verify it by taking its transpose). It is also idempotent becauseTherefore,where has a standard multivariate normal distribution, that is, a multivariate normal distribution with zero mean and unit covariance matrix. Since the matrix is symmetric and idempotent, the quadratic form has a Chi-square distribution with a number of degrees of freedom equal to the trace of the matrix (see the lecture Normal distribution - Quadratic forms). But the trace of isBy using the fact that the expected value of a Chi-square random variable is equal to its number of degrees of freedom, we have
Note that also in this case, the proposed estimator is unbiased not only conditionally, but also unconditionally because, by the Law of Iterated Expectations, we have that
We have already proved that in the Normal Linear Regression Model the conditional covariance matrix of the OLS estimator (conditional on ) is
In practice, however, this quantity is not known exactly because the variance of the error terms, that is , is unknown. However, we can replace its unknown value with the estimator proposed above (the adjusted sample variance of the residuals), so as to obtain an estimator of the covariance matrix of :
This estimator is often employed to construct test statistics that allow to conduct tests of hypotheses about the regression coefficients.
Most learning materials found on this website are now available in a traditional textbook format.