Search for probability and statistics terms on Statlect

Linear regression - Maximum Likelihood Estimation

by , PhD

This lecture shows how to perform maximum likelihood estimation of the parameters of a linear regression model whose error terms are normally distributed conditional on the regressors.

In order to fully understand the material presented here, it might be useful to revise the introductions to maximum likelihood estimation (MLE) and to the Normal Linear Regression Model.

Table of Contents

The regression model

The objective is to estimate the parameters of the linear regression model[eq1]where $y_{i}$ is the dependent variable, $x_{i}$ is a $1	imes K$ vector of regressors, $eta _{0}$ is the Kx1 vector of regression coefficients to be estimated and $arepsilon _{i}$ is an unobservable error term.

The sample is made up of $N$ IID observations [eq2].

The regression equations can be written in matrix form as[eq3]where the $N	imes 1$ vector of observations of the dependent variable is denoted by $y$, the $N	imes K$ matrix of regressors is denoted by X, and the $N	imes 1$ vector of error terms is denoted by epsilon.


We assume that the vector of errors epsilon has a multivariate normal distribution conditional on X, with mean equal to 0 and covariance matrix equal to[eq4]where I is the $N	imes N$ identity matrix and [eq5]is the second parameter to be estimated.

Furthermore, it is assumed that the matrix of regressors X has full-rank.

Implications of the assumptions

The assumption that the covariance matrix of epsilon is diagonal implies that the entries of epsilon are mutually independent (i.e., $arepsilon _{i}$ is independent of $arepsilon _{j}$ for $i
eq j$.). Moreover, they all have a normal distribution with mean 0 and variance $sigma _{0}^{2}$.

By the properties of linear transformations of normal random variables, the dependent variable $y_{i}$ is conditionally normal, with mean $x_{i}eta _{0}$ and variance $sigma _{0}^{2}$. Therefore, its conditional probability density function is [eq6]

The likelihood function

The likelihood function is[eq7]


Since the observations from the sample are independent, the likelihood of the sample is equal to the product of the likelihoods of the single observations:[eq8]

The log-likelihood function

The log-likelihood function is [eq9]


It is obtained by taking the natural logarithm of the likelihood function:[eq10]

The maximum likelihood estimators

The maximum likelihood estimators of the regression coefficients and of the variance of the error terms are[eq11]


The estimators solve the following maximization problem [eq12]The first-order conditions for a maximum are [eq13]where $
abla _{eta }$ indicates the gradient calculated with respect to $eta $, that is, the vector of the partial derivatives of the log-likelihood with respect to the entries of $eta $. The gradient is [eq14]which is equal to zero only if[eq15]Therefore, the first of the two equations is satisfied if [eq16]where we have used the assumption that X has full rank and, as a consequence, $X^{	op }X$ is invertible. The partial derivative of the log-likelihood with respect to the variance is [eq17]which, if we assume $sigma ^{2}
eq 0$, is equal to zero only if[eq18]Thus, the system of first order conditions is solved by[eq19]Note that [eq20] does not depend on [eq21], so that this is an explicit solution.

Thus, the maximum likelihood estimators are:

  1. for the regression coefficients, the usual OLS estimator;

  2. for the variance of the error terms, the unadjusted sample variance of the residuals [eq22].

Asymptotic variance

The vector of parameters[eq23]is asymptotically normal with asymptotic mean equal to[eq24]and asymptotic covariance matrix equal to[eq25]


The first K entries of the score vector [eq26] are[eq27]The $left( K+1
ight) $-th entry of the score vector is[eq28]The Hessian, that is, the matrix of second derivatives, can be written as a block matrix [eq29]Let us compute the blocks:[eq30]and[eq31]Finally, [eq32] Therefore, the Hessian is[eq33]By the information equality, we have that[eq34]But [eq35]and, by the Law of Iterated Expectations,[eq36]Thus,[eq37]As a consequence, the asymptotic covariance matrix is[eq38]

This means that the probability distribution of the vector of parameter estimates [eq39]can be approximated by a multivariate normal distribution with mean [eq40]and covariance matrix[eq41]

Other examples

StatLect has several pages on maximum likelihood estimation. Learn how to derive the estimators of the parameters of the following distributions and models.

Exponential distributionUnivariate distributionAnalytical
Normal distributionUnivariate distributionAnalytical
Poisson distributionUnivariate distributionAnalytical
T distributionUnivariate distributionNumerical
Multivariate normal distributionMultivariate distributionAnalytical
Logistic classification modelClassification modelNumerical
Probit classification modelClassification modelNumerical

How to cite

Please cite as:

Taboga, Marco (2021). "Linear regression - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.