This lecture discusses the main properties of the Normal Linear Regression Model (NLRM), a linear regression model in which the vector of errors of the regression is assumed to have a multivariate normal distribution conditional on the matrix of regressors. The assumption of multivariate normality, together with other assumptions (mainly concerning the covariance matrix of the errors), allows us to derive analytically the distributions of the Ordinary Least Squares (OLS) estimators of the regression coefficients and of several other statistics.
We use the same notation used in the lecture entitled
Properties of
the OLS estimator (to which you can refer for more details): the
vector of observations of the dependent variable is denoted by
,
the
matrix of regressors (called design matrix) is denoted by
,
the
vector of errors is denoted by
and the
vector of regression coefficients is denoted by
,
so that the regression equations can be written in matrix form
as
The
OLS estimator
is the vector which minimizes the sum of squared
residuals
and,
if the design matrix
has full rank, it can be computed
as
The assumptions made in a normal linear regression model are:
the design matrix
has full-rank (as a consequence,
is invertible and the OLS estimator is
);
conditional on
,
the vector of errors
has a multivariate normal distribution with mean
equal to
and covariance matrix equal
to
where
is a positive constant and
is the
identity matrix;
Note that the assumption that the covariance matrix of
is diagonal implies that the entries of
are mutually independent, that is,
is independent of
for
.
Moreover, the assumption that all diagonal entries of the covariance matrix
are equal implies that all the entries of
have the same variance, that is,
for any
.
The latter assumption is often referred to as "homoscedasticity assumption",
and if the assumption is satisfied, we say that the errors are homoscedastic.
On the contrary, if homoscedasticity does not hold, we say that the errors are
heteroscedastic.
Under the assumptions made in the previous section, the OLS estimator has a multivariate normal distribution, conditional on the design matrix.
Proposition
In a Normal Linear Regression Model, the OLS estimator
has a multivariate normal distribution, conditional on
,
with mean
and
covariance
matrix
First of all, note
thatThe
fact that we are conditioning on
means that we can treat
as a constant matrix. Therefore, conditional on
,
the OLS estimator
is a linear
transformation of a multivariate normal random vector (the vector
).
This implies that also
is multivariate normal, with
mean
and
variance
Note that
means that the OLS estimator is unbiased, not only conditionally, but also
unconditionally, because by the Law of Iterated Expectations we have
that
The variance of the error terms
is usually not known. A commonly used estimator of
is the adjusted sample variance of the
residuals:
where
the regression residuals are
The properties enjoyed by
are summarized by the following proposition.
Proposition
In a Normal Linear Regression Model, the adjusted sample variance of the
residuals
is a conditionally unbiased estimator of
:
Furthermore,
conditional on
,
has a Gamma
distribution with parameters
and
and it is
independent
of
.
Denote by
the
vector of residuals. Remember from the previous proof that the OLS estimator
can be written
as
As
a consequence, we
have
The
matrix
is
clearly symmetric (verify it by taking its transpose). It is also idempotent
because
Therefore,
where
has a standard multivariate normal distribution, that is, a multivariate
normal distribution with zero mean and unit covariance matrix. Since the
matrix
is symmetric and idempotent, the quadratic form
has
a Chi-square distribution with a number of
degrees of freedom equal to the trace of the matrix
(see the lecture Normal
distribution - Quadratic forms). But the trace of
is
Since
the expected value of a Chi-square random variable is equal to its number of
degrees of freedom, we
have
Moreover,
the fact that the quadratic form
has a Chi-square distribution with
degrees of freedom implies that the sample
variance
has
a Gamma distribution with parameters
and
(see the lecture on the
Gamma
distribution for a proof of this fact). To conclude, we need to prove that
is independent of
.
Since
and
we
have that
and
are functions of the same multivariate normal random vector
.
Therefore, by
standard
results on the independence of quadratic forms involving normal vectors,
and
are independent if
and
are orthogonal. In order to check their orthogonality, we only need to verify
that the product between
and
is
zero:
Note that also in this case, the proposed estimator is unbiased not only
conditionally, but also unconditionally because, by the Law of Iterated
Expectations, we have
that
We have already proved that in the Normal Linear Regression Model the
conditional covariance matrix of the OLS estimator (conditional on
)
is
In practice, however, this quantity is not known exactly because the variance
of the error terms, that is
,
is unknown. However, we can replace its unknown value with the estimator
proposed above (the adjusted sample variance of the residuals), so as to
obtain an estimator of the covariance matrix of
:
This estimator is often employed to construct test statistics that allow us to conduct tests of hypotheses about the regression coefficients.
It can be proved that the OLS estimators of the coefficients of a Normal Linear Regression Model are equal to the maximum likelihood estimators. On the contrary, the maximum likelihood estimator of the variance of the error terms is different from the estimator derived above. For proofs of these two facts, see the lecture entitled Linear Regression - Maximum likelihood estimation.
In the lecture on Linear regressions and hypothesis testing we explain how to perform hypothesis tests on the coefficients of a normal linear regression model.
Please cite as:
Taboga, Marco (2021). "The normal linear regression model", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/normal-linear-regression-model.
Most of the learning materials found on this website are now available in a traditional textbook format.