 StatLect

Maximum likelihood - Covariance matrix estimation

In the lecture entitled Maximum likelihood we have demonstrated that, under certain assumptions, the distribution of the maximum likelihood estimator of a vector of parameters can be approximated by a multivariate normal distribution with mean and covariance matrix where is the log-likelihood of one observation from the sample, evaluated at the true parameter , and the gradient is the vector of first derivatives of the log-likelihood.

Because is unknown, also this covariance matrix is unknown. Here we discuss methods to consistently estimate it.

We make the same assumptions made in the aforementioned lecture. So, for example, where are the realizations of the first terms of an IID sequence . Under these assumptions we also have that the information equality holds, so that where the Hessian matrix is the matrix of second-order partial derivatives of the log-likelihood function. Outer product of gradients (OPG) estimate

The first estimate of the asymptotic covariance matrix is called outer product of gradients (OPG) estimate and it is computed as It takes its name from the fact that the gradient is a column vector, its transpose is a row vector, and the product between a column and a row is called outer product.

Provided some regularity conditions are satisfied, the OPG estimator is a consistent estimator of , that is, it converges in probability to .

Proof

We provide only a sketch of the proof and we refer the reader to Newey and McFadden (1994) for a more rigorous exposition. Provided some regularity conditions are satisfied (see the source just cited), we have the following equality between probability limits: where has been replaced by because, being a consistent estimator, it converges in probability to . Because the sample is IID, by the Law of Large Numbers we have that Now, the formula for the covariance matrix (see the lecture entitled Covariance matrix) yields But the expected value of the gradient evaluated at is , so that Thus, Because matrix inversion is continuous, by the Continuous Mapping theorem we have which is exactly the result we needed to prove.

Hessian estimate

The second estimate of the asymptotic covariance matrix is called Hessian estimate and it is computed as Under some regularity conditions, the Hessian estimator is also a consistent estimator of .

Proof

Again, we do not provide an entirely rigorous proof (for which you can see Newey and McFadden - 1994) and we only a sketch the main steps. First of all, under some regularity conditions, we have that where has been replaced by because, being a consistent estimator, it converges in probability to . Now, since the sample is IID, by the Law of Large Numbers we have that By the information equality, we have Therefore, Because matrix inversion is continuous, by the Continuous Mapping theorem we have which is what we needed to prove.

Sandwich estimate

The third estimate of the asymptotic covariance matrix is called Sandwich estimate and it is computed as where is the OPG estimate and is the Hessian estimate.

Also the Sandwich estimator is a consistent estimator of .

Proof

This is again a consequence of the Continuous Mapping theorem: where the last equality follows from the consistency of the OPG and Hessian estimators.

References

Newey, W. K. and D. McFadden (1994) "Chapter 35: Large sample estimation and hypothesis testing", in Handbook of Econometrics, Elsevier.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.

Glossary entries
Share