EM algorithm

The Expectation-Maximization (EM) algorithm is a recursive algorithm that can be used to search for the maximum likelihood estimators of model parameters when the model includes some unobservable variables (also called latent variables).

Table of contents

Latent-variable model
Model specification
Derived quantities
Maximum likelihood problem
The algorithm
Theoretical guarantees
Advantages of the EM algorithm
Caveats
Example: estimation of Gaussian mixtures

Latent-variable model

In a latent-variable model, there are two vectors:

the observed data ;
the vector of unobserved variables .

We denote their joint probability bywhere is a vector of parameters to be estimated.

Model specification

The joint distribution of and is usually obtained by separately specifying the conditional distribution of the observed variablesand the marginal distribution of the latent variables

As a consequence, the joint distribution is

In the context of latent-variable models, the joint distribution is often called complete likelihood.

Derived quantities

Given the model specification above, if is a discrete random vector, we can derive the marginal distribution of as [eq5] where the sum is over the set $R_{z}$ of all the values that can possibly take (its support).

We can also derive the conditional distribution of as [eq6]

If is continuous the sums above are replaced by integrals.

Maximum likelihood problem

The maximum likelihood estimator (MLE) of the parameter is the vector that solves the maximization problem

This problem does not usually have an analytical solution. As a consequence, we need to solve it with an iterative procedure that starts from an initial guess $heta _{0}$ of the solution and produces a sequencethat hopefully converges to the solution.

We say hopefully because we are often unable to derive theoretical guarantees about the numerical convergence of likelihood maximization algorithms, especially for latent-variable models.

The EM algorithm is one of the iterative procedures that can be used to search for a solution when we are dealing with a latent-variable model specified as above.

The algorithm

Starting from an initial guess $heta _{0}$ , the -th iteration of the EM algorithm consists of the following steps:

use the parameter value $heta _{j-1}$ found in the previous iteration to compute the conditional probabilitiesfor each $zin R_{Z}$ ;
use the conditional probabilities derived in step 1 to compute the expected value of the complete log-likelihood:
solve the maximization problem
if the parameter update is smaller than a pre-specified threshold , that is, if stop the algorithm, else return to step 1.

Steps 1 and 2 are collectively called the Expectation step, while step 3 is called the Maximization step. Hence the name of the algorithm (Expectation-Maximization).

Step 4 is a stopping criterion: we stop the algorithm when there are no significant changes in the parameter.

Theoretical guarantees

In general, the algorithm is not guaranteed to converge to a global maximum of the likelihood.

However, it has the important property that the likelihood is guaranteed to increase at each iteration:for every .

Proof

The conditional probability formula [eq14] implies thatIf we multiply both sides by , we obtainThen, we can sum over the support of : [eq18] Since the log-likelihood does not depend on and [eq20] we have [eq21] Moreover, [eq22] Thus, [eq23] Setting , we obtain [eq25] Subtracting (2) from (1), we get [eq26] By Jensen's inequality, we have [eq27] Therefore,which, for , becomes But becauseThereforeor

Advantages of the EM algorithm

The EM algorithm is particularly advantageous when the maximization problem in the Maximization step has a closed-form solution.

This happens, for example, when the latent-variable model is a mixture of multivariate normal distributions.

Caveats

As we said, there is no guarantee that the EM algorithm converges to a global maximum of the likelihood.

If we suspect that the likelihood may have multiple local minima, we should use the multiple starts approach. In other words, we should run the EM algorithm several times with different starting values $heta _{0}$ .

Example: estimation of Gaussian mixtures

The EM algorithm is often used to estimate the parameters of a mixture of normal distributions (also called Gaussian mixture). See this lecture for details.

How to cite

Please cite as:

Taboga, Marco (2021). "EM algorithm", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/EM-algorithm.