 StatLect

# Information matrix

The information matrix (also called Fisher information matrix) is the matrix of second cross-moments of the score vector. The latter is the vector of first partial derivatives of the log-likelihood function with respect to its parameters. ## Definition

To define the information matrix, we need the following objects:

• a sample ;

• a parameter vector that characterizes the distribution of ;

• the likelihood function ;

• the log-likelihood function • the score vector that is, the vector of first derivatives of with respect to the entries of .

The information matrix is the matrix of second cross-moments of the score: The notation indicates that the expected value is taken with respect to the probability distribution of associated to the parameter .

## The expected value

We take an expected value because the sample is random.

For example, if the sample has a continuous distribution, then the likelihood function is where is the probability density function of , parametrized by .

Then, the information matrix is ## The information matrix is the covariance matrix of the score

Under mild regularity conditions, the expected value of the score is equal to zero: As a consequence, that is, the information matrix is the covariance matrix of the score.

## Information equality

Under mild regularity conditions, it can be proved that where is the matrix of second-order cross-partial derivatives (so-called Hessian matrix) of the log-likelihood.

This equality is called information equality.

## Example: information matrix of the normal distribution

As an example, consider a sample made up of the realizations of IID normal random variables with parameters and (mean and variance).

In this case, the information matrix is Proof

The log-likelihood function is as proved in the lecture on maximum likelihood estimation of the parameters of the normal distribution. The score is a vector whose entries are the partial derivatives of the log-likelihood with respect to and : The information matrix is We have where: in step we have used the fact that for because the variables in the sample are independent and have mean equal to ; in step we have used the fact that Moreover, where: in steps and we have used the independence of the observations in the sample and in step we have used the fact that the fourth central moment of the normal distribution is equal to . Finally, where: in step we have used the facts that and that for because the variables in the sample are independent; in step we have used the fact that the third central moment of the normal distribution is equal to zero.

## Covariance matrix of the maximum likelihood estimator

When the sample is made up of IID observations, as in the previous example, the covariance matrix of the maximum likelihood estimator of is approximately equal to the inverse of the information matrix.

Denote the maximum likelihood estimator of by . Then, Proof

Denote by the iid observations. The log-likelihood of the sample is where is the log-likelihood of the -th observation. Under some technical conditions, we have proved that converges in distribution to a normal distribution with zero mean and covariance matrix equal to This implies that where: in step we use the fact that the observations are identically distributed; in step we can bring the summation inside the variance operator because the observations are independent; in step we exploit the linearity of the gradient; in step we use the fact that the information matrix is equal to the covariance matrix of the score.

Note that in general, this is true only if the observations in the sample are independently and identically distributed.

## More details

More details about the Fisher information matrix, including proofs of the information equality and of the fact that the expected value of the score is equal to zero, can be found in the lecture on Maximum likelihood.

## Keep reading the glossary

Previous entry: Impossible event

Next entry: Integrable random variable