Index > Fundamentals of statistics

Generalized least squares

The generalized least squares (GLS) estimator of the coefficients of a linear regression is a generalization of the ordinary least squares (OLS) estimator. It is used to deal with situations in which the OLS estimator is not BLUE (best linear unbiased estimator) because one of the main assumptions of the Gauss-Markov theorem, namely that of homoskedasticity and absence of serial correlation, is violated. In such situations, provided that the other assumptions of the Gauss-Markov theorem are satisfied, the GLS estimator is BLUE.

Table of Contents


The linear regression is[eq1]where:

We assume that:

  1. X has full rank;

  2. [eq2];

  3. [eq3], where V is a $N	imes N$ symmetric positive definite matrix.

These assumptions are the same made in the Gauss-Markov theorem in order to prove that OLS is BLUE, except for assumption 3.

In the Gauss-Markov theorem, we make the more restrictive assumption that [eq4] where I is the $N	imes N$ identity matrix. The latter assumption means that the errors of the regression are homoskedastic (they all have the same variance) and uncorrelated (their covariances are all equal to zero).

Instead, we now allow for heteroskedasticity (the errors can have different variances) and correlation (the covariances between errors can be different from zero).

The GLS estimator

Since V is symmetric and positive definite, there is an invertible matrix Sigma such that[eq5]

If we pre-multiply the regression equation by $Sigma ^{-1}$, we obtain[eq6]

Define[eq7]so that the transformed regression equation can be written as[eq8]

The following proposition holds.

Proposition The OLS estimator of the coefficients of the transformed regression equation, called generalized least squares estimator, is[eq9]Furthermore, [eq10] is BLUE (best linear unbiased).


The estimator is derived from the formula of the OLS estimator of the coefficients of the transformed regression equation: [eq11]

Furthermore, we have that $widetilde{X}$ is full-rank (because X and Sigma are). Moreover,[eq12]and [eq13]

Therefore, the transformed regression satisfies all of the conditions of Gauss-Markov theorem, and the OLS estimator of $eta $ obtained from (1) is BLUE.

The generalized least squares problem

Remember that the OLS estimator [eq14] of a linear regression solves the problem[eq15]that is, it minimizes the sum of squared residuals.

The GLS estimator can be shown to solve the problem[eq16]which is called generalized least squares problem.


The first order condition for a maximum is[eq17]whose solution is[eq18]or[eq19]The second order derivative is[eq20]which is positive definite (because X is full-rank and V is positive definite). Therefore, the function to be minimized is globally convex and the solution of the first order condition is a global minimum.

The function to be minimized can be written as[eq21]

It is also a sum of squared residuals, but the original residuals $y-Xb$ are rescaled by $Sigma ^{-1}$ before being squared and summed.

Weighted least squares

When the covariance matrix V is diagonal (i.e., the error terms are uncorrelated), the GLS estimator is called weighted least squares estimator (WLS). In this case the function to be minimized becomes[eq22]where $y_{i}$ is the i-th entry of $y$, X_i is the i-th row of X, and $V_{ii}$ is the i-th diagonal element of V. Thus, we are minimizing a weighted sum of the squared residuals, in which each squared residual is weighted by the reciprocal of its variance. In other words, while estimating $eta $, we are giving less weight to the observations for which the linear relationship to be estimated is more noisy, and more weight to those for which it is less noisy.

Feasible generalized least squares

Note that we need to know the covariance matrix V in order to actually compute [eq10]. In practice, we seldom know V and we replace it with an estimate $widehat{V}$. The estimator thus obtained, that is,[eq24]is called feasible generalized least squares estimator.

There is no general method for estimating V, although the residuals of a fist-step OLS regression are typically used to compute $widehat{V}$. How the problem is approached depends on the specific application and on additional assumptions that may be made about the process generating the errors of the regression.

Example A typical situation in which V is estimated by running a first-step OLS regression is when the observations are indexed by time. For example, we could assume that V is diagonal and estimate its diagonal elements with an exponential moving average[eq25]where[eq26]

The book

Most of the learning materials found on this website are now available in a traditional textbook format.