Search for probability and statistics terms on Statlect

Dirichlet distribution

by , PhD

The Dirichlet distribution is a multivariate continuous probability distribution often used to model the uncertainty about a vector of unknown probabilities.

Table of Contents

Generalizing the Beta distribution

The Dirichlet distribution is a multivariate generalization of the Beta distribution.

Denote by X the probability of an event. If X is unknown, we can treat it as a random variable, and assign a Beta distribution to X.

If X is a vector of unknown probabilities of mutually exclusive events, we can treat X as a random vector and assign a Dirichlet distribution to it.


The Dirichlet distribution is characterized as follows.

Definition Let X be a Kx1 continuous random vector. Let its support be[eq1]Let [eq2]. We say that X has a Dirichlet distribution with parameters [eq3] if and only if its joint probability density function is[eq4]where the normalizing constant $c$ is[eq5] and [eq6] is the Gamma function.


In the above definition, the entries of the vector X are K probabilities [eq7] whose sum is less than or equal to 1:[eq8]

If we want to have a vector of probabilities exactly summing up to 1, we can define an additional probability [eq9]so that[eq10]

However, there is no way to rigorously define a probability density for the vector[eq11]because the constraint in equation (2) implies that the probability density should be zero everywhere on $U{211d} ^{K+1}$ except on a subset whose Lebesgue measure is equal to zero, and on the latter set the probability density should be infinite (something involving a Dirac delta function).

Therefore, the right way to deal with $K+1$ events whose probabilities sum up to 1 is to:

We notice that several sources (including the Wikipedia page on the Dirichlet distribution) are not entirely clear about this point.

How the distribution is derived

How do we come up with the above formula for the density of the Dirichlet distribution?

The next proposition provides some insights.

Proposition Let [eq13] be independent Gamma random variables having means [eq14] and degrees-of-freedom parameters [eq15]. Define[eq16]Then, the Kx1 random vector[eq17]has a Dirichlet distribution with parameters [eq18].


A Gamma random variable is supported on the set of positive real numbers. Moreover,[eq19]and[eq20]Therefore, the support of X coincides with that of a Dirichlet random vector. The probability density of a Gamma random variable $Z_{k}$ with mean parameter $lpha _{k}$ and degrees-of-freedom parameter $2lpha _{k}$ is[eq21]Since the variables [eq22] are independent, their joint probability density is[eq23]Consider the one-to-one transformation[eq24]whose inverse is[eq25]The Jacobian matrix of $g^{-1}$ is[eq26]The determinant of the Jacobian is[eq27]because: 1) the determinant does not change if we add the first K rows to the $left( K+1
ight) $-th row; 2) the determinant of a triangular matrix is equal to the product of its diagonal entries. The formula for the joint probability density of a one-to-one transformation gives us (on the support of [eq28]):[eq29]By integrating out $s$, we obtain[eq30]where in step $rame{A}$ we have used the definition of the Gamma function. The latter expression is the density of the Dirichlet distribution with parameters [eq31].

Relation to the Beta distribution

The Beta distribution is a special case of the Dirichlet distribution.

If we set the dimension $K=1$ in the definition above, the support becomes [eq32]and the probability density function becomes[eq33]

By using the definition of the Beta function[eq34]we can re-write the density as[eq35]

But this is the density of a Beta random variable with parameters $lpha _{1}$ and $lpha _{2}$.

Marginal distributions

The following proposition is often used to prove interesting results about the Dirichlet distribution.

Proposition Let X be a Kx1 Dirichlet random vector with parameters [eq36]. Let $L$ be any integer such that $1leq L<K$. Then, the the marginal distribution of the $L	imes 1$ subvector[eq37]is a Dirichlet distribution with parameters [eq38].


First of all, notice that if the proposition holds for $L=K-1$, then we can use it recursively to show that it holds for all the other possible values of $L$. So, we assume $L=K-1$. In order to derive the marginal distribution, we need to integrate $x_{K}$ out of the joint density of X:[eq39]where[eq40]and we have used indicator functions to specify the support of X; for example, [eq41] is equal to 1 if [eq42] and to 0 otherwise. We can re-write the marginal density as[eq43]After defining[eq44]we can solve the integral as follows:[eq45]where: in step $rame{A}$ we made the change of variable [eq46]; in step $rame{B}$ we used the integral representation of the Beta function; in step $rame{C}$ we used the relation between the Beta and Gamma functions. Thus, we have[eq47]which is the density of a $left( K-1
ight) $-dimensional Dirichlet distribution with parameters [eq48].

A corollary of the previous two propositions follows.

Proposition Let X be a Kx1 Dirichlet random vector with parameters [eq49]. Then, the marginal distribution of the k-th entry of X is a Beta distribution with parameters $lpha _{k}$ and [eq50] where[eq51]

Expected value

The expected value of a Dirichlet random vector X is[eq52]


We know that the marginal distribution of each entry of X is a Beta distribution. Therefore, we can use, for each entry, the formula for the expected value of a Beta random variable: [eq53]


The cross-moments of a Dirichlet random vector are [eq54]where [eq55] are non-negative integers.


The formula is derived as follows:[eq56]In the last step we have used the fact that the expression inside the integral is the joint probability density of a Dirichlet distribution with parameters [eq57]

Covariance matrix

The entries of the covariance matrix of a Dirichlet random vector X are[eq58]where [eq59]


We can use the covariance formula [eq60]and its special case[eq61]together with the formulae for the expected value and the cross-moments derived previously. When $j
eq k$, we have[eq62]where we have used the property of the Gamma function[eq63]and we have defined[eq64]Therefore, for $j
eq k$, we have[eq65]When $j=k$, we have[eq66]and[eq67]

How to cite

Please cite as:

Taboga, Marco (2021). "Dirichlet distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.