# Conjugate prior

In Bayesian inference, the prior distribution of a parameter and the likelihood of the observed data are combined to obtain the posterior distribution of the parameter.

If the prior and the posterior belong to the same parametric family, then the prior is said to be conjugate for the likelihood.

## Review of parametric families

First of all, let us review the concept of a parametric family.

Let a set of probability distributions be put in correspondence with a parameter space .

If the correspondence is a function (i.e., it associates one and only one distribution in to each parameter ), then is called a parametric family.

## Examples of parametric families

Examples of parametric families are:

## Prior, likelihood and posterior

In a Bayesian inference problem, we specify two distributions:

• the likelihood, that is, the distribution of the observed data conditional on the parameter :

• the prior distribution of the parameter

After observing the data, we use Bayes' rule to compute the posterior distribution

## Definition of a conjugate prior

We can now define the concept of a conjugate prior.

Definition Let be a parametric family. A prior belonging to is said to be conjugate for the likelihood if and only if the posterior belongs to .

In other words, when we use a conjugate prior, the posterior resulting from the Bayesian updating process is in the same parametric family as the prior.

## Example

In the lecture on Bayesian inference about the mean of a normal distribution, we have already encountered a conjugate prior.

In that lecture, is a vector of IID draws from a normal distribution having unknown mean and known variance . Moreover, both the prior and the posterior distribution of the parameter are normal. Hence, the prior and the posterior belong to the same parametric family of normal distributions, and the prior is conjugate with the likelihood.

## Usefulness of conjugate priors

There are basically two reasons why models with conjugate priors are popular (e.g., Robert 2007, Bernardo and Smith 2009):

1. they usually allow us to derive a closed-form expression for the posterior distribution;

2. they are easy to interpret, as we can easily see how the parameters of the prior change after the Bayesian update.

## Exponential families

Conjugate priors are easily characterized when the distribution of belongs to an exponential family, in which case the likelihood takes the formwhere:

• the base measure is a function that depends only on the vector of data ;

• the parameter is a vector;

• the sufficient statistic is a vector-valued function of ;

• the log-partition function is a function of .

• is the dot product between and .

A parametric family of conjugate priors for the above likelihood is formed by all the distributions such thatwhere:

• is a vector of parameters;

• is a scalar parameter;

• is a function that returns the normalization constant needed to make a proper probability density (or mass) function.

The parameters and are called hyperparameters.

Note thatimplies that

As a consequence, the above parametric family of conjugate priors, called a natural family, contains all the distributions associated to couples of hyperparameters such that the integral in the denominator is well-defined and finite.

Given the likelihood and the prior, the posterior isprovided is well-defined.

Proof

The posterior is proportional to the prior times the likelihood:Therefore,where we know that the constant of proportionality is .

## Examples of natural families

Let us now see how powerful the technology of natural families is, by deriving the conjugate priors of some common distributions.

### Bernoulli likelihood and beta priors

Remember that a Bernoulli random variable is equal to with probability and to with probability .

Suppose that we observe a realization of the Bernoulli variable and we want to carry out some Bayesian inference on the unknown parameter .

The likelihood has exponential form:where is an indicator function equal to if and to otherwise, and

The natural family of conjugate priors contains priors of the form

Since is an increasing function of andwe can apply the formula for the density of an increasing function:

Thus, the natural family of conjugate priors contains priors that assign to a Beta distribution with parameters and .

According to the general formula derived above for natural families, the posterior distribution of iswhich implies (by the same argument just used for the prior) that the posterior distribution of isthat is, a Beta distribution with parameters and .

### Poisson likelihood and Gamma prior

If has a Poisson distribution, its likelihood iswhere is a parameter and is the set of non-negative integer numbers.

We can write the likelihood in exponential form:where

The natural family of conjugate priors contains priors of the form

Since is an increasing function of andwe can apply the formula for the density of an increasing function:

Thus, the natural family of conjugate priors contains priors that assign to a Gamma distribution with parameters and .

By the general formula for natural families, the posterior distribution of iswhich implies (by the same argument just used for the prior) that the posterior distribution of isthat is, a Gamma distribution with parameters and .

## References

Bernardo, J. M., and Smith, A. F. M. (2009) Bayesian Theory, Wiley.

Robert, C. P. (2007) The Bayesian Choice, Springer.