Search for probability and statistics terms on Statlect

Multivariate normal distribution - Maximum Likelihood Estimation

by , PhD

In this lecture we show how to derive the maximum likelihood estimators of the two parameters of a multivariate normal distribution: the mean vector and the covariance matrix.

In order to understand the derivation, you need to be familiar with the concept of trace of a matrix.

Table of Contents


Suppose we observe the first n terms of an IID sequence [eq1] of K-dimensional multivariate normal random vectors.

The joint probability density function of the $j$-th term of the sequence is[eq2] where:

The covariance matrix $V_{0}$ is assumed to be positive definite, so that its determinant [eq3] is strictly positive.

We use [eq4], that is, the realizations of the first n random vectors in the sequence, to estimate the two unknown parameters $mu _{0}$ and $V_{0}$.

The likelihood function

The likelihood function is[eq5]


Since the terms in the sequence are independent, their joint density is equal to the product of their marginal densities. As a consequence, the likelihood function can be written as[eq6]

The log-likelihood function

The log-likelihood function is [eq7]


The log-likelihood is obtained by taking the natural logarithm of the likelihood function:[eq8]

Note that the likelihood function is well-defined only if [eq9] is strictly positive. This reflects the assumption made above that the true parameter $V_{0}$ is positive definite, which implies that the search for a maximum likelihood estimator of $V_{0}$ is restricted to the space of positive definite matrices.

For convenience, we can also define the log-likelihood in terms of the precision matrix $V^{-1}$:[eq10]where we have used the property of the determinant [eq11]


Before deriving the maximum likelihood estimators, we need to state some facts about matrices, their trace and their derivatives:

The maximum likelihood estimators

The maximum likelihood estimators of the mean and the covariance matrix are[eq18]


We need to solve the following maximization problem [eq19]The first order conditions for a maximum are [eq20]The gradient of the log-likelihood with respect to the mean vector is [eq21]which is equal to zero only if[eq22]Therefore, the first of the two first-order conditions implies [eq23]The gradient of the log-likelihood with respect to the precision matrix is [eq24]By transposing the whole expression and setting it equal to zero, we get[eq25]Thus, the system of first order conditions is solved by[eq26]

Information matrix

We are now going to give a formula for the information matrix of the multivariate normal distribution, which will be used to derive the asymptotic covariance matrix of the maximum likelihood estimators.

Denote by $	heta $ the [eq27] column vector of all parameters:[eq28]where [eq29] converts the matrix V into a $K^{2}	imes 1$ column vector whose entries are taken from the first column of V, then from the second, and so on.

The log-likelihood of one observation from the sample can be written as[eq30]

The information matrix is [eq31]

Define the Kx1 vector[eq32]


Define the $K	imes K$ matrix[eq35]

Note that:

It can be proved (see, e.g., Pistone and Malagò 2015) that the $left( m,n
ight) $-th element of the information matrix is[eq38]

Asymptotic variance

The vector[eq39]is asymptotically normal with asymptotic mean equal to[eq40]and asymptotic covariance matrix equal to[eq41]

In more formal terms,[eq42] converges in distribution to a multivariate normal distribution with zero mean and covariance matrix [eq43].

In other words, the distribution of the vector [eq44] can be approximated by a multivariate normal distribution with mean $	heta _{0}$ and covariance matrix[eq45]


Pistone, G. and Malagò, L. (2015) " Information Geometry of the Gaussian Distribution in View of Stochastic Optimization", Proceedings of the 2015 ACM Conference on Foundations of Genetic Algorithms XIII, 150-162.

How to cite

Please cite as:

Taboga, Marco (2021). "Multivariate normal distribution - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.