Dirichlet distribution

The Dirichlet distribution is a multivariate continuous probability distribution often used to model the uncertainty about a vector of unknown probabilities.

Table of contents

Generalizing the Beta distribution
Definition
Caveat
How the distribution is derived
Relation to the Beta distribution
Marginal distributions
Expected value
Cross-moments
Covariance matrix

Generalizing the Beta distribution

The Dirichlet distribution is a multivariate generalization of the Beta distribution.

Denote by the probability of an event. If is unknown, we can treat it as a random variable, and assign a Beta distribution to .

If is a vector of unknown probabilities of mutually exclusive events, we can treat as a random vector and assign a Dirichlet distribution to it.

Definition

The Dirichlet distribution is characterized as follows.

Definition Let be a continuous random vector. Let its support be [eq1] Let . We say that has a Dirichlet distribution with parameters if and only if its joint probability density function is [eq4] where the normalizing constant is [eq5] and is the Gamma function.

Caveat

In the above definition, the entries of the vector are probabilities whose sum is less than or equal to 1: [eq8]

If we want to have a vector of probabilities exactly summing up to 1, we can define an additional probability [eq9] so that [eq10]

However, there is no way to rigorously define a probability density for the vectorbecause the constraint in equation (2) implies that the probability density should be zero everywhere on $U{211d} ^{K+1}$ except on a subset whose Lebesgue measure is equal to zero, and on the latter set the probability density should be infinite (something involving a Dirac delta function).

Therefore, the right way to deal with events whose probabilities sum up to 1 is to:

assign a Dirichlet density, as defined above, to the probabilities of events ().
define the probability $X_{K+1}$ of the -th event as in equation (1).

We notice that several sources (including the Wikipedia page on the Dirichlet distribution) are not entirely clear about this point.

How the distribution is derived

How do we come up with the above formula for the density of the Dirichlet distribution?

The next proposition provides some insights.

Proposition Let be independent Gamma random variables having means and degrees-of-freedom parameters . Define [eq16] Then, the random vector [eq17] has a Dirichlet distribution with parameters .

Proof

A Gamma random variable is supported on the set of positive real numbers. Moreover,and [eq20] Therefore, the support of coincides with that of a Dirichlet random vector. The probability density of a Gamma random variable $Z_{k}$ with mean parameter $lpha _{k}$ and degrees-of-freedom parameter $2lpha _{k}$ is [eq21] Since the variables are independent, their joint probability density is [eq23] Consider the one-to-one transformation [eq24] whose inverse is [eq25] The Jacobian matrix of $g^{-1}$ is [eq26] The determinant of the Jacobian is [eq27] because: 1) the determinant does not change if we add the first rows to the -th row; 2) the determinant of a triangular matrix is equal to the product of its diagonal entries. The formula for the joint probability density of a one-to-one transformation gives us (on the support of ): [eq29] By integrating out , we obtain [eq30] where in step we have used the definition of the Gamma function. The latter expression is the density of the Dirichlet distribution with parameters .

Relation to the Beta distribution

The Beta distribution is a special case of the Dirichlet distribution.

If we set the dimension in the definition above, the support becomes and the probability density function becomes [eq33]

By using the definition of the Beta function [eq34] we can re-write the density as [eq35]

But this is the density of a Beta random variable with parameters $lpha _{1}$ and $lpha _{2}$ .

Marginal distributions

The following proposition is often used to prove interesting results about the Dirichlet distribution.

Proposition Let be a Dirichlet random vector with parameters . Let be any integer such that . Then, the the marginal distribution of the subvector [eq37] is a Dirichlet distribution with parameters .

Proof

First of all, notice that if the proposition holds for , then we can use it recursively to show that it holds for all the other possible values of . So, we assume . In order to derive the marginal distribution, we need to integrate $x_{K}$ out of the joint density of : [eq39] where [eq40] and we have used indicator functions to specify the support of ; for example, is equal to 1 if and to 0 otherwise. We can re-write the marginal density as [eq43] After defining [eq44] we can solve the integral as follows: [eq45] where: in step we made the change of variable ; in step we used the integral representation of the Beta function; in step we used the relation between the Beta and Gamma functions. Thus, we have [eq47] which is the density of a -dimensional Dirichlet distribution with parameters .

A corollary of the previous two propositions follows.

Proposition Let be a Dirichlet random vector with parameters . Then, the marginal distribution of the -th entry of is a Beta distribution with parameters $lpha _{k}$ and where [eq51]

Expected value

The expected value of a Dirichlet random vector is [eq52]

Proof

We know that the marginal distribution of each entry of is a Beta distribution. Therefore, we can use, for each entry, the formula for the expected value of a Beta random variable: [eq53]

Cross-moments

The cross-moments of a Dirichlet random vector are [eq54] where are non-negative integers.

Proof

The formula is derived as follows: [eq56] In the last step we have used the fact that the expression inside the integral is the joint probability density of a Dirichlet distribution with parameters

Covariance matrix

The entries of the covariance matrix of a Dirichlet random vector are [eq58] where [eq59]

Proof

We can use the covariance formula and its special casetogether with the formulae for the expected value and the cross-moments derived previously. When , we have [eq62] where we have used the property of the Gamma functionand we have defined [eq64] Therefore, for , we have [eq65] When , we have [eq66] and [eq67]

How to cite

Please cite as:

Taboga, Marco (2021). "Dirichlet distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/probability-distributions/Dirichlet-distribution.