Search for probability and statistics terms on Statlect
Index > Fundamentals of statistics

Linear regression models

by , PhD

Linear regression models belong to the class of conditional models. In a linear regression model, the output variable (also called dependent variable, or regressand) is assumed to be a linear function of the input variables (also called independent variables, or regressors) and of an unobservable error term that adds noise to the linear relationship between inputs and outputs.

Table of Contents

Main assumptions and notation

This section introduces the main assumptions and the notation and terminology used in dealing with linear regression models.

We assume that the statistician observes a sample of realizations [eq1] for $i=1,ldots ,N$ (i.e., the sample size is equal to $N$). The output variables, which are scalars, are denoted by $y_{i}$, and the associated inputs, which are $1	imes K$ vectors, are denoted by $x_{i}$.

It is postulated that there is a linear relationship between inputs and outputs:[eq2]where $eta $ is a Kx1 vector of constants, called regression coefficients, and $arepsilon _{i}$ is an unobservable error term which encompasses the sources of variability in $y_{i}$ that are not included in the vector of inputs $x_{i}$ (for example, measurement errors or input variables that are not observed by the statistician). Note that the relationship is assumed to hold for each $i=1,ldots ,N$, with the same $eta $.

Example Suppose we have a sample of individuals for which weight, height and age are observed, and we want to set up a linear regression model to predict weight based on height and age. Then, we could postulate that[eq3]where $w_{i}$, $h_{i}$ and $lpha _{i}$ denote the weight, age and height of the i-th individual in the sample, respectively, $eta _{1}$, $eta _{2}$ and $eta _{3}$ are regression coefficients, and $arepsilon _{i}$ is an error term. This regression equation can be written as[eq4]by defining $y_{i}=w_{i}$, the $1	imes 3$ vector $x_{i}$ as[eq5]and the $3	imes 1$ vector $eta $ as [eq6]

Matrix notation

Denote by $y$ the $N	imes 1$ vector of outputs[eq7]by X the $N	imes K$ matrix of inputs[eq8]and by [eq9] the $N	imes 1$ vector of error terms. Then, the linear relationship can be expressed as[eq10]The matrix X is often called design matrix.


The vector of regressors $x_{i}$ is usually assumed to contain a constant variable equal to 1. Without loss of generality, it can be assumed that it is the first entry of $x_{i}$, so that the first column of the design matrix X is a column of 1s.

The regression coefficient corresponding to the constant variable is called intercept.

Example Suppose the number of regressors is $K=2$ and the regression includes a constant equal to 1. Then, we have that[eq11]The coefficient $eta _{1}$ is the intercept of the regression.

Note that when an intercept is included in the regression, then it can be assumed without loss of generality that the expected value of the error term is equal to 0. For instance, in the previous example, if [eq12], then we can write[eq13]and define a new regression equation[eq14]where [eq15] and [eq16]. Of course, [eq17] because[eq18]


Statistical inference about a regression model is usually carried out in the form of point estimation, set estimation and hypothesis testing about:

Furthermore, the estimates of $eta $ and of the distribution of epsilon are usually employed to make predictions about observations that do not belong to the sample. For example, the inputs $x_{N+1}$ of an out-of-sample observation can be used to compute the expected value of its corresponding output $y_{N+1}$.


In order to derive estimators of the vector of regression coefficients $eta $ and of the covariance of the errors epsilon (as well as to establish properties of the estimators such as unbiasedness, consistency and asymptotic variance) it is necessary to make some assumptions about the joint distribution of the matrix of regressors X and the vector of error terms epsilon. We will discuss such assumptions in the following sections. However, we would like to anticipate the fact that the most commonly used estimator of $eta $ is the Ordinary Least Squares (OLS) estimator. As we will explain, the OLS estimator is not only computationally convenient, but it enjoys good statistical properties under different sets of assumptions on the joint distribution of X and epsilon.

OLS estimation

The following is a formal definition of the OLS estimator.

Definition An estimator $widehat{eta }$ is an OLS estimator of $eta $ if and only if satisfies[eq19]

In other words, the OLS estimator is obtained by finding a vector of estimated regression coefficients that minimizes the sum, over all observations, of the squared residuals, where a residual[eq20]is the difference between the observed output $y_{i}$ and its predicted value $x_{i}b$ (predicted under the hypothesis that $b$ is the vector of regression coefficients).

Note that the closer the predicted values are to the actual output values, the smaller the sum of squared residuals is. Thus, the OLS estimator is the vector of regression coefficients that makes the predicted values as close as possible to the actual output values (the catch here is that the distances between predicted and observed values are measured by their squared differences).

Under the assumption that the design matrix has full rank, the minimization problem above has a a solution that is both unique and explicit.

Proposition If the design matrix X has full rank, then the OLS estimator is[eq21]


First of all, observe that the sum of squared residuals, henceforth indicated by $SSR$, can be written in matrix form as follows:[eq22]The first order condition for a minimum is that the gradient of $SSR$ with respect to $eta $ should be equal to zero:[eq23]that is,[eq24]or[eq25]Now, if X has full rank (i.e., rank equal to K), then the matrix [eq26] is invertible. As a consequence, the first order condition is satisfied by[eq27]We now need to check that this is indeed a global minimum. Note that the Hessian matrix, that is, the matrix of second derivatives of $SSR$, is[eq28]But $X^{	op }X$ is a positive definite matrix because, for any $a
eq 0$, we have[eq29]where the last inequality follows from the fact that X has full rank (and, as a consequence, $a
eq 0$ implies that $x_{i}a$ cannot be equal to 0 for every i). Thus, $SSR$ is strictly convex in $b$, which implies that $b$ is indeed a global minimum.


As already anticipated, the linearity assumption[eq30]is not per se sufficient to determine the properties of the OLS estimator of $eta $ or of any other estimator of $eta $ and of the characteristics of the distribution of $arepsilon _{i}$. In order to be able to derive any meaningful property, we need to make further assumptions about the joint distribution of the regressors X and the error terms epsilon. These further assumptions, together with the linearity assumption, form a linear regression model.

A popular linear regression model is the so called Normal Linear Regression Model (NLRM), in which it is assumed that the vector of errors epsilon has a multivariate normal distribution conditional on the design matrix X, and that the covariance matrix of epsilon is diagonal and all the diagonal entries are equal (in other words, the entries of epsilon are mutually independent and have constant variance). Under these hypotheses, the vector of OLS estimators of the regression coefficients has a multivariate normal distribution, and the distributions of several test statistics can be derived analytically. More details about the NLRM can be found in the lecture entitled Normal Linear Regression Model.

While the NLRM has several appealing properties, its assumptions are unrealistic in many practical cases of interest. For this reason, it is often deemed preferable to make weaker assumptions, under which it is possible to prove that the OLS estimators are consistent and asymptotically normal. These assumption are discussed in the lecture entitled Properties of the OLS estimator.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.