In Bayesian inference, the prior distribution of a parameter and the likelihood of the observed data are combined to obtain the posterior distribution of the parameter.
If the prior and the posterior belong to the same parametric family, then the prior is said to be conjugate for the likelihood.
Table of contents
First of all, let us review the concept of a parametric family.
Let a set of probability distributions
be put in correspondence with a parameter
space
.
If the correspondence is a function (i.e., it associates one and only one
distribution in
to each parameter
),
then
is called a parametric family.
Examples of parametric families are:
the set of all
normal
distributions, indexed by the parameter vector
,
where
and
are the mean and the variance of the distribution;
the set of all
exponential
distributions, indexed by the rate parameter
;
the set of all
chi-square
distributions, indexed by the degrees-of-freedom parameter
.
In a Bayesian inference problem, we specify two distributions:
the likelihood, that is, the distribution of the observed data
conditional on the parameter
:
the prior distribution of the
parameter
After observing the data, we use
Bayes' rule to compute
the posterior
distribution
We can now define the concept of a conjugate prior.
Definition
Let
be a parametric family. A prior
belonging to
is said to be conjugate for the likelihood
if and only if the posterior
belongs to
.
In other words, when we use a conjugate prior, the posterior resulting from the Bayesian updating process is in the same parametric family as the prior.
In the lecture on Bayesian inference about the mean of a normal distribution, we have already encountered a conjugate prior.
In that lecture,
is a vector of IID draws from a normal
distribution having unknown mean
and known variance
.
Moreover, both the prior and the posterior distribution of the parameter
are normal. Hence, the prior and the posterior belong to the same parametric
family of normal distributions, and the prior is conjugate with the
likelihood.
There are basically two reasons why models with conjugate priors are popular (e.g., Robert 2007, Bernardo and Smith 2009):
they usually allow us to derive a closed-form expression for the posterior distribution;
they are easy to interpret, as we can easily see how the parameters of the prior change after the Bayesian update.
Conjugate priors are easily characterized when the distribution of
belongs to an
exponential
family, in which case the likelihood takes the
form
where:
the base measure
is a function that depends only on the
vector of data
;
the parameter
is a
vector;
the sufficient statistic
is a vector-valued function of
;
the log-partition function
is a function of
.
is the dot product between
and
.
A parametric family of conjugate priors for the above likelihood is formed by
all the distributions such
thatwhere:
is a
vector of parameters;
is a scalar parameter;
is a function that returns the normalization constant needed to make
a proper probability density (or mass) function.
The parameters
and
are called hyperparameters.
Note
thatimplies
that
As a consequence, the above parametric family of conjugate priors, called a
natural family, contains all the distributions associated to
couples of hyperparameters
such that the integral in the denominator is well-defined and finite.
Given the likelihood and the prior, the posterior
isprovided
is well-defined.
The posterior is proportional to the prior
times the
likelihood:Therefore,
where
we know that the constant of proportionality is
.
Let us now see how powerful the technology of natural families is, by deriving the conjugate priors of some common distributions.
Remember that a
Bernoulli random
variable is equal to
with probability
and to
with probability
.
Suppose that we observe a realization
of the Bernoulli variable and we want to carry out some Bayesian inference on
the unknown parameter
.
The likelihood has exponential
form:where
is an indicator function equal to
if
and to
otherwise, and
The natural family of conjugate priors contains priors of the
form
Since
is an increasing function of
and
we
can apply the
formula
for the density of an increasing
function:
Thus, the natural family of conjugate priors contains priors that assign to
a Beta
distribution with parameters
and
.
According to the general formula derived above for natural families, the
posterior distribution of
is
which
implies (by the same argument just used for the prior) that the posterior
distribution of
is
that
is, a Beta distribution with parameters
and
.
If
has a Poisson
distribution, its likelihood
is
where
is a parameter and
is the set of non-negative integer numbers.
We can write the likelihood in exponential
form:where
The natural family of conjugate priors contains priors of the
form
Since
is an increasing function of
and
we
can apply the formula for the density of an increasing
function:
Thus, the natural family of conjugate priors contains priors that assign to
a Gamma
distribution with parameters
and
.
By the general formula for natural families, the posterior distribution of
is
which
implies (by the same argument just used for the prior) that the posterior
distribution of
is
that
is, a Gamma distribution with parameters
and
.
Bernardo, J. M., and Smith, A. F. M. (2009) Bayesian Theory, Wiley.
Robert, C. P. (2007) The Bayesian Choice, Springer.
Please cite as:
Taboga, Marco (2021). "Conjugate prior", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/conjugate-prior.
Most of the learning materials found on this website are now available in a traditional textbook format.