In Bayesian inference, the prior distribution of a parameter and the likelihood of the observed data are combined to obtain the posterior distribution of the parameter.
If the prior and the posterior belong to the same parametric family, then the prior is said to be conjugate for the likelihood.
Table of contents
First of all, let us review the concept of a parametric family.
Let a set of probability distributions be put in correspondence with a parameter space .
If the correspondence is a function (i.e., it associates one and only one distribution in to each parameter ), then is called a parametric family.
Examples of parametric families are:
the set of all normal distributions, indexed by the parameter vector , where and are the mean and the variance of the distribution;
the set of all exponential distributions, indexed by the rate parameter ;
the set of all chi-square distributions, indexed by the degrees-of-freedom parameter .
In a Bayesian inference problem, we specify two distributions:
the likelihood, that is, the distribution of the observed data conditional on the parameter :
the prior distribution of the parameter
After observing the data, we use Bayes' rule to compute the posterior distribution
We can now define the concept of a conjugate prior.
Definition Let be a parametric family. A prior belonging to is said to be conjugate for the likelihood if and only if the posterior belongs to .
In other words, when we use a conjugate prior, the posterior resulting from the Bayesian updating process is in the same parametric family as the prior.
In the lecture on Bayesian inference about the mean of a normal distribution, we have already encountered a conjugate prior.
In that lecture, is a vector of IID draws from a normal distribution having unknown mean and known variance . Moreover, both the prior and the posterior distribution of the parameter are normal. Hence, the prior and the posterior belong to the same parametric family of normal distributions, and the prior is conjugate with the likelihood.
There are basically two reasons why models with conjugate priors are popular (e.g., Robert 2007, Bernardo and Smith 2009):
they usually allow us to derive a closed-form expression for the posterior distribution;
they are easy to interpret, as we can easily see how the parameters of the prior change after the Bayesian update.
Conjugate priors are easily characterized when the distribution of belongs to an exponential family, in which case the likelihood takes the formwhere:
the base measure is a function that depends only on the vector of data ;
the parameter is a vector;
the sufficient statistic is a vector-valued function of ;
the log-partition function is a function of .
is the dot product between and .
A parametric family of conjugate priors for the above likelihood is formed by all the distributions such thatwhere:
is a vector of parameters;
is a scalar parameter;
is a function that returns the normalization constant needed to make a proper probability density (or mass) function.
The parameters and are called hyperparameters.
Note thatimplies that
As a consequence, the above parametric family of conjugate priors, called a natural family, contains all the distributions associated to couples of hyperparameters such that the integral in the denominator is well-defined and finite.
Given the likelihood and the prior, the posterior isprovided is well-defined.
The posterior is proportional to the prior times the likelihood:Therefore,where we know that the constant of proportionality is .
Let us now see how powerful the technology of natural families is, by deriving the conjugate priors of some common distributions.
Remember that a Bernoulli random variable is equal to with probability and to with probability .
Suppose that we observe a realization of the Bernoulli variable and we want to carry out some Bayesian inference on the unknown parameter .
The likelihood has exponential form:where is an indicator function equal to if and to otherwise, and
The natural family of conjugate priors contains priors of the form
Since is an increasing function of andwe can apply the formula for the density of an increasing function:
Thus, the natural family of conjugate priors contains priors that assign to a Beta distribution with parameters and .
According to the general formula derived above for natural families, the posterior distribution of iswhich implies (by the same argument just used for the prior) that the posterior distribution of isthat is, a Beta distribution with parameters and .
If has a Poisson distribution, its likelihood iswhere is a parameter and is the set of non-negative integer numbers.
We can write the likelihood in exponential form:where
The natural family of conjugate priors contains priors of the form
Since is an increasing function of andwe can apply the formula for the density of an increasing function:
Thus, the natural family of conjugate priors contains priors that assign to a Gamma distribution with parameters and .
By the general formula for natural families, the posterior distribution of iswhich implies (by the same argument just used for the prior) that the posterior distribution of isthat is, a Gamma distribution with parameters and .
Bernardo, J. M., and Smith, A. F. M. (2009) Bayesian Theory, Wiley.
Robert, C. P. (2007) The Bayesian Choice, Springer.
Please cite as:
Taboga, Marco (2021). "Conjugate prior", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/conjugate-prior.
Most of the learning materials found on this website are now available in a traditional textbook format.