An exponential family is a parametric family of distributions whose probability density (or mass) functions satisfy certain properties that make them highly tractable from a mathematical viewpoint.
Table of contents
Let us start by briefly reviewing the definition of a parametric family.
Let
be a set of probability distributions.
Put
in correspondence with a parameter space
.
If the correspondence is a function that associates one and only one
distribution in
to each parameter
,
then
is called a parametric family.
Example
Let
be the set of all
normal
distributions. Each distribution is characterized by its mean
(a real number) and its
variance
(a positive real number). Thus, the set of distributions
is put into correspondence with the parameter space
.
A member of the parameter space is a parameter vector
.
Since to each parameter
corresponds one and only one normal distribution, the set
of all normal distributions is a parametric family.
In what follows, we are going to focus our attention on parametric families of continuous distributions.
However, everything we say applies with straightforward modifications also to families of discrete distributions.
We can now define exponential families.
Definition
A parametric family of univariate continuous distributions is said to be an
exponential family if and only if the
probability density
function of any member of the family can be written
aswhere:
is a function that depends only on
;
is a
vector of parameters;
is a vector-valued function of the vector of parameters
;
is a vector-valued function of
;
is the dot product
between
and
;
is a function of
.
The key property that characterizes an exponential family is the fact that
and
interact only via a dot product (after appropriate transformations
and
).
Since the integral of a probability density function must be equal to 1, we
have:
In other words, the function
is completely determined by the choice of
,
and
.
The function
is called log-partition function or log-normalizer.
Its exponential is a constant of proportionality, as we can
writewhere
is the proportionality symbol.
The vector
is called sufficient statistic because it satisfies a criterion for
sufficiency,
namely, the
density
is
a product of:
a factor that does not depend on the parameter;
a factor that depends only on the parameter and on the sufficient statistic.
The vector
is called natural parameter.
When
,
the pdf of
becomes
where
the log-partition function
satisfies
The function
is called base measure.
It is so-called
becausein
the base case in which
.
All the members of the family are perturbations of the base measure, obtained
by varying
.
The integral in equation (1) is not guaranteed to be finite.
As a consequence, an exponential family is well-defined only if
and
are chosen in such a way that the integral in equation (1) is finite for at
least some values of
.
Since
is strictly positive for finite
,
and
,
the density
is equal to zero only when
is.
Therefore, the base measure
determines the support
of
,
which does not depend on
.
To summarize what we have explained above, let us list the main steps needed to build an exponential family:
we choose a base measure
;
we choose a vector of sufficient statistics
of dimension
;
we write the
natural parameter as a function
of a
parameter
;
we try to find the log-partition function
by computing the
integral
if the log-partition function is finite for some values of
,
then we have built a family of distributions, called an exponential family,
whose densities are of the
form
This list of steps should clarify the fact that there are infinitely many exponential families: for each choice of the base measure and the vector of sufficient statistics, we obtain a different family.
The
joint
moment generating function of the sufficient statistic
is
This is proved as
followswhere
in step
we have used the fact that the integral is equal to
because it is the integral of a pdf (by the very definition of the
log-partition function
).
Denote the
-th
entry of the sufficient statistic by
.
Then, its expected
value
is
The
joint
cumulant generating function (cgf) of
is
By
the properties of the cgf, its first partial derivative with respect to
,
evaluated at
,
is equal to
.
Therefore,
The covariance between
the
-th
and
-th
entries of the vector of sufficient statistics
is
Again, the joint cumulant generating
function of the sufficient statistic
isBy
the properties of the cgf, its second cross-partial derivative with respect to
and
,
evaluated at
,
is equal to
.
Therefore,
Several commonly used families of distributions are exponential. Here are some examples.
The family of normal
distributions with
densityis
exponential:
We can write the density as
follows:
The family of
binomial
distributions with probability mass
functionis
exponential for fixed
:
We can write the probability mass function
as
follows:
We have already discussed the normal and binomial distributions.
Other important families of distributions previously discussed in these lectures are exponential (prove it as an exercise):
In the binomial example above we have learned an important fact: there are cases in which a family of distributions is not exponential, but we can derive an exponential family from it by keeping one of the parameters fixed.
In other words, even if a family is not exponential, one of its subsets may be.
There are infinitely many equivalent ways to represent the same exponential family.
For
example,is
the same
as
where
for
any constant
.
Let
be independently and identically
distributed draws from a member of an exponential family having
density
Then, the maximum
likelihood estimator of the natural parameter
is the value of
that solves the
equation
The likelihood of the sample
isThe
log-likelihood
is
The
gradient of the log-likelihood with respect to the natural parameter vector
is
Therefore,
the first order condition for a maximum
is
There are two interesting things to note in the formula for the maximum likelihood estimator (MLE) of the parameter of an exponential family.
First, the MLE depends only on the sample
average of the sufficient statistic, that is,
on
Regardless of the sample size
,
all the information about the parameter provided by the sample is summarized
by an
vector.
Second, since
,
the MLE
solves
where
the notation
highlights that the expected value is computed with respect to a probability
density that depends on
.
In other words, the MLE is obtained by matching the sample mean of the
sufficient statistic with its population mean
.
The definition of an exponential family of multivariate distributions is a straightforward generalization of the definition given above for univariate distributions.
Definition
A parametric family of
-dimensional
multivariate continuous distributions is said to be an exponential family if
and only if the joint
probability density function of any member of the family can be written
as
where:
is a function that depends only on
;
is a
vector of parameters;
is a vector-valued function of the vector of parameters
;
is a vector-valued function of
;
is the dot product between
and
;
is a function of
.
This definition is virtually identical to the previous one. The only
difference is that
is no longer a scalar, but it is now an
vector.
Also all the main results (about the moments and the mgf of the sufficient statistic, and about maximum likelihood estimation) remain unchanged.
As an exercise, you can check that in all the proofs above it does not matter
whether
is a scalar or a vector.
The only thing that changes is that we need to compute a multiple integral, instead of a simple integral, in order to work out the log-partition function.
Examples of multivariate exponential families are those of:
multinomial distributions (if the number-of-trials parameter is kept fixed).
Please cite as:
Taboga, Marco (2021). "Exponential family of distributions", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/exponential-family-of-distributions.
Most of the learning materials found on this website are now available in a traditional textbook format.