Search for probability and statistics terms on Statlect

Multinoulli distribution

by , PhD

The Multinoulli distribution (sometimes also called categorical distribution) is a multivariate discrete distribution that generalizes the Bernoulli distribution.

Table of Contents

How the distribution is used

If you perform an experiment that can have only two outcomes (either success or failure), then a random variable that takes value 1 in case of success and value 0 in case of failure is a Bernoulli random variable.

If you perform an experiment that can have K outcomes and you denote by X_i a random variable that takes value 1 if you obtain the i-th outcome and 0 otherwise, then the random vector X defined as[eq1]is a Multinoulli random vector.

In other words, when the i-th outcome is obtained, the i-th entry of the Multinoulli random vector X takes value 1, while all the other entries are equal to 0.

In what follows the probabilities of the K possible outcomes will be denoted by [eq2].


The distribution is characterized as follows.

Definition Let X be a Kx1 discrete random vector. Let the support of X be the set of Kx1 vectors having one entry equal to 1 and all other entries equal to 0:[eq3]Let $p_{1}$, ..., $p_{K}$ be K strictly positive numbers such that[eq4]We say that X has a Multinoulli distribution with probabilities $p_{1}$, ..., $p_{K}$ if its joint probability mass function is[eq5]

If you are puzzled by the above definition of the joint pmf, note that when [eq6] and $x_{i}=1$ because the i-th outcome has been obtained, then all other entries are equal to 0 and[eq7]

Expected value

The expected value of X is[eq8]where the Kx1 vector p is defined as follows:[eq9]


The i-th entry of X, denoted by X_i, is an indicator function of the event "the i-th outcome has happened". Therefore, its expected value is equal to the probability of the event it indicates:[eq10]

Covariance matrix

The covariance matrix of X is[eq11]where Sigma is a $K	imes K$ matrix whose generic entry is[eq12]


We need to use the formula (see the lecture entitled Covariance matrix):[eq13]If $j=i$, then[eq14]where we have used the fact that $X_{i}^{2}=X_{i}$ because X_i can take only values 0 and 1. If $j
eq i$, then[eq15]where we have used the fact that $X_{i}X_{j}=0$, because X_i and $X_{j}$ cannot be both equal to 1 at the same time.

Joint moment generating function

The joint moment generating function of X is defined for any $tin U{211d} ^{K}$:[eq16]


If the $j$-th outcome is obtained, then $X_{i}=0$ for $i
eq j$ and $X_{i}=1$ for $i=j$. As a consequence,[eq17]and the joint moment generating function is[eq18]

Joint characteristic function

The joint characteristic function of X is[eq19]


If the $j$-th outcome is obtained, then $X_{i}=0$ for $i
eq j$ and $X_{i}=1$ for $i=j$. As a consequence,[eq20]and the joint characteristic function is[eq21]

Relation between the Multinoulli and the multinomial distribution

A sum of independent Multinoulli random variables is a multinomial random variable. This is discussed and proved in the lecture entitled Multinomial distribution.

How to cite

Please cite as:

Taboga, Marco (2021). "Multinoulli distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.