Search for probability and statistics terms on Statlect
Index > Probability distributions

Multinoulli distribution

by , PhD

The Multinoulli distribution (sometimes also called categorical distribution) is a generalization of the Bernoulli distribution. If you perform an experiment that can have only two outcomes (either success or failure), then a random variable that takes value 1 in case of success and value 0 in case of failure is a Bernoulli random variable. If you perform an experiment that can have K outcomes and you denote by $X_{i} $ a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise, then the random vector X defined as[eq1]is a Multinoulli random vector. In other words, when the i-th outcome is obtained, the i-th entry of the Multinoulli random vector X takes value 1, while all other entries take value 0.

In what follows the probabilities of the K possible outcomes will be denoted by [eq2].

Table of Contents


The distribution is characterized as follows.

Definition Let X be a Kx1 discrete random vector. Let the support of X be the set of Kx1 vectors having one entry equal to 1 and all other entries equal to 0:[eq3]Let $p_{1}$, ..., $p_{K}$ be K strictly positive numbers such that[eq4]We say that X has a Multinoulli distribution with probabilities $p_{1}$, ..., $p_{K}$ if its joint probability mass function is[eq5]

If you are puzzled by the above definition of the joint pmf, note that when [eq6] and $x_{i}=1$ because the i-th outcome has been obtained, then all other entries are equal to 0 and[eq7]

Expected value

The expected value of X is[eq8]where the Kx1 vector p is defined as follows:[eq9]


The i-th entry of X, denoted by X_i, is an indicator function of the event "the i-th outcome has happened". Therefore, its expected value is equal to the probability of the event it indicates:[eq10]

Covariance matrix

The covariance matrix of X is[eq11]where Sigma is a $K	imes K$ matrix whose generic entry is[eq12]


We need to use the formula (see the lecture entitled Covariance matrix):[eq13]If $j=i$, then[eq14]where we have used the fact that $X_{i}^{2}=X_{i}$ because X_i can take only values 0 and 1. If $j
eq i$, then[eq15]where we have used the fact that $X_{i}X_{j}=0$, because X_i and $X_{j}$ cannot be both equal to 1 at the same time.

Joint moment generating function

The joint moment generating function of X is defined for any $tin U{211d} ^{K}$:[eq16]


If the $j$-th outcome is obtained, then $X_{i}=0$ for $i
eq j$ and $X_{i}=1$ for $i=j$. As a consequence,[eq17]and the joint moment generating function is[eq18]

Joint characteristic function

The joint characteristic function of X is[eq19]


If the $j$-th outcome is obtained, then $X_{i}=0$ for $i
eq j$ and $X_{i}=1$ for $i=j$. As a consequence,[eq20]and the joint characteristic function is[eq21]

More details

The following sections contain more details about the Multinoulli distribution.

Relation between the Multinoulli and the multinomial distribution

A sum of independent Multinoulli random variables is a multinomial random variable. This is discussed and proved in the lecture entitled Multinomial distribution.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.