StatlectThe Digital Textbook
Index > Fundamentals of statistics > Maximum likelihood

Maximum likelihood - Covariance matrix estimation

In the lecture entitled Maximum likelihood we have demonstrated that, under certain assumptions, the distribution of the maximum likelihood estimator [eq1] of a vector of parameters $	heta _{0}$ can be approximated by a multivariate normal distribution with mean $	heta _{0}$ and covariance matrix[eq2]where [eq3] is the log-likelihood of one observation from the sample, evaluated at the true parameter $	heta _{0}$, and the gradient [eq4] is the vector of first derivatives of the log-likelihood.

Because $	heta _{0}$ is unknown, also this covariance matrix is unknown. Here we discuss methods to consistently estimate it.

We make the same assumptions made in the aforementioned lecture. So, for example, [eq5]where [eq6] are the realizations of the first n terms of an IID sequence [eq7]. Under these assumptions we also have that the information equality holds, so that[eq8]where the Hessian matrix [eq9] is the matrix of second-order partial derivatives of the log-likelihood function.

Outer product of gradients (OPG) estimate

The first estimate of the asymptotic covariance matrix[eq10]is called outer product of gradients (OPG) estimate and it is computed as[eq11]It takes its name from the fact that the gradient [eq12] is a column vector, its transpose is a row vector, and the product between a column and a row is called outer product.

Provided some regularity conditions are satisfied, the OPG estimator $widehat{V}_{n}$ is a consistent estimator of V, that is, it converges in probability to V.

Proof

We provide only a sketch of the proof and we refer the reader to Newey and McFadden (1994) for a more rigorous exposition. Provided some regularity conditions are satisfied (see the source just cited), we have the following equality between probability limits:[eq13]where [eq14] has been replaced by $	heta _{0}$ because, being a consistent estimator, it converges in probability to $	heta _{0}$. Because the sample is IID, by the Law of Large Numbers we have that[eq15]Now, the formula for the covariance matrix (see the lecture entitled Covariance matrix) yields[eq16]But the expected value of the gradient evaluated at $	heta _{0}$ is 0, so that[eq17]Thus,[eq18]Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq19]which is exactly the result we needed to prove.

Hessian estimate

The second estimate of the asymptotic covariance matrix[eq20]is called Hessian estimate and it is computed as[eq21]

Under some regularity conditions, the Hessian estimator $widetilde{V}_{n}$ is also a consistent estimator of V.

Proof

Again, we do not provide an entirely rigorous proof (for which you can see Newey and McFadden - 1994) and we only a sketch the main steps. First of all, under some regularity conditions, we have that[eq22]where [eq23] has been replaced by $	heta _{0}$ because, being a consistent estimator, it converges in probability to $	heta _{0}$. Now, since the sample is IID, by the Law of Large Numbers we have that[eq24]By the information equality, we have[eq25]Therefore,[eq26]Because matrix inversion is continuous, by the Continuous Mapping theorem we have [eq27]which is what we needed to prove.

Sandwich estimate

The third estimate of the asymptotic covariance matrix[eq28]is called Sandwich estimate and it is computed as[eq29]where $widehat{V}_{n}$ is the OPG estimate and $widetilde{V}_{n}$ is the Hessian estimate.

Also the Sandwich estimator [eq30] is a consistent estimator of V.

Proof

This is again a consequence of the Continuous Mapping theorem:[eq31]where the last equality follows from the consistency of the OPG and Hessian estimators.

References

Newey, W. K. and D. McFadden (1994) "Chapter 35: Large sample estimation and hypothesis testing", in Handbook of Econometrics, Elsevier.

The book

Most learning materials found on this website are now available in a traditional textbook format.