A conditional distribution is the probability distribution of a random variable, calculated according to the rules of conditional probability after observing the realization of another random variable.
We will discuss how to update the probability distribution of a
random variable
after receiving the information that another random variable
has taken a specific value
.
The updated probability distribution of
will be called the conditional distribution of
given
.
The two random variables
and
,
considered together, form a random vector
.
Depending on the characteristics of the random vector
,
different procedures need to be followed in order to compute the conditional
probability distribution of
given
.
In the remainder of this lecture, these procedures are presented in the following order:
first, we tackle the case in which the random vector
is a discrete random
vector;
then, we tackle the case in which
is a continuous
random vector;
finally, we briefly discuss the case in which
is neither discrete nor continuous.
Note that if we are able to update the probability distribution of
when we receive the information that
,
then we can also revise the distribution of
when we get to know that a generic event
has happened.
It suffices to set
,
where
is the indicator function of the event
,
and compute the distribution of
conditional on the realization
.
In the case in which
is a discrete random vector, the
probability mass
function (pmf) of
conditional on the information that
is called conditional probability mass function.
Definition
Let
be
a discrete random vector. We say that a function
is the conditional probability mass function of
given
if, for any
,
where
is the conditional probability that
given that
.
How do we derive the conditional pmf from the
joint pmf
?
The following proposition provides an answer to this question.
Proposition
Let
be
a discrete random vector. Let
be its joint pmf, and
the marginal
pmf of
.
The conditional pmf of
given
is
provided
.
This is just the usual formula for computing
conditional probabilities (conditional probability equals joint probability
divided by marginal probability):
In the proposition above, we assume that the marginal pmf
is known. If it is not, it can be derived from the joint pmf
by marginalization.
Example
Let the support of
be
and
its joint pmf
be
Let
us compute the conditional pmf of
given
.
The support of
is
The
marginal pmf of
evaluated at
is
The
support of
is
Thus,
the conditional pmf of
given
is
When
,
it is in general not possible to unambiguously derive the conditional pmf of
,
as we show below with an example.
This impossibility (known as the Borel-Kolmogorov paradox) is not particularly worrying, as it is seldom relevant in applications.
Example
The example is a bit involved. You might safely skip it on a first
reading. Suppose that the sample space
is the set of all real numbers between
and
:
It
is possible to build a probability measure
on
,
such that
assigns
to each sub-interval of
a probability equal to its length, that
is,
This
is the same sample space discussed in the lecture on
zero-probability events. Define a random
variable
as
follows:
and
another random variable
as
follows:
Both
and
are discrete random variables and, considered together, they constitute a
discrete random vector
.
Suppose that we want to compute the conditional pmf of
conditional on
.
It is easy to see that
.
As a consequence, we cannot use the
formula
because
division by zero is not possible. Also the technique of deriving a conditional
probability implicitly, as a realization of a
conditional probability with respect to a
sigma-algebra does not allow us to unambiguously derive
.
In this case, the partition of interest is
,
where
and
can be viewed as the realization of the conditional probability
when
.
The fundamental property of conditional
probability
is
satisfied in this case if and only if, for a given
,
the following system of equations is
satisfied:
which
implies
The
second equation does not help to determine
.
So, from the first equation, it is evident that
is undetermined (any number, when multiplied by zero, gives zero). One can
show that also the requirement that
be a regular conditional probability does not
help to pin down
.
What does it mean that
is undetermined? It means that any choice of
is legitimate, provided the requirement
is satisfied. Is this really a paradox? No, because conditional probability
with respect to a partition is defined up to almost sure equality, and
is a zero-probability event. As a consequence, the value that
takes on
does not matter. Roughly speaking, we do not really need to care about
zero-probability events, provided there is only a countable number of them.
In the case in which
is a continuous random vector, the
probability density
function (pdf) of
conditional on the information that
is called conditional probability density function.
Definition
Let
be
a continuous random vector. We say that a function
is the conditional probability density function of
given
if, for any interval
,
and
is such that the above integral is well defined.
How do we derive the conditional pdf from the
joint pdf
?
The following proposition provides an answer to this question.
Proposition
Let
be
a continuous random vector. Let
be its joint pdf, and
be the marginal pdf of
.
The conditional pdf of
given
is
provided
.
Deriving the conditional
distribution of
given
is far from obvious. As explained in the lecture on
random variables, whatever value of
we choose, we are conditioning on a zero-probability event:
Therefore,
the standard formula (conditional probability equals joint probability divided
by marginal probability) cannot be used. However, it turns out that the
definition of conditional probability with respect
to a partition can be fruitfully applied in this case to derive the
conditional pdf of
given
.
In order to prove
that
is
a legitimate choice, we need to prove that conditional probabilities
calculated by using this conditional pdf satisfy the fundamental property of
conditional
probability:
for
any
and
.
Thanks to some basic results in measure theory, we can confine our attention
to the events
and
that can be written as
follows:
For
these events, it is immediate to verify that the fundamental property of
conditional probability holds. First, by the very definition of a conditional
pdf, we have
that
Furthermore,
the indicator function
is also a function of
.
Therefore, the product
is a function of
,
and we can use the
transformation theorem to
compute its expected value:
The
last equality proves the proposition.
Example
Let the support of
be
and
its joint pdf
be
Let
us compute the conditional pdf of
given
.
The support of
is
When
,
the marginal pdf of
is
;
when
,
the marginal pdf
is
Thus,
the marginal pdf of
is
When
evaluated at
,
it
is
The
support of
is
Thus,
the conditional pdf of
given
is
In general, when
is
neither discrete nor continuous, we can characterize the
distribution function of
conditional on the information that
.
Definition
We say that a function
is the conditional distribution function of
given
if and only
if
where
is the conditional probability that
given that
.
There is no immediate way of deriving the conditional distribution of
given
.
However, we can characterize it by using the concept of
conditional probability with respect to a
partition, as follows.
Define the events
as
follows:
and
a partition
of events
as
where,
as usual,
is the support of
.
Then, for any
we
have
where
is the probability that
conditional on the partition
.
As we know,
is guaranteed to exist and is unique up to almost sure equality. Of course,
this does not mean that we are able to compute it.
Nonetheless, this characterization is extremely useful because it allows us to
speak of the conditional distribution of
given
in general, without the need to specify whether
and
are discrete or continuous.
The following sections contain more details about conditional distributions.
We have discussed how to update the probability distribution of a random
variable
after observing the realization of another random variable
,
that is, after receiving the information that
.
What happens when
and
are random vectors rather than random variables?
Basically, everything we said above still applies with straightforward modifications.
Thus, if
and
are discrete random vectors, then the conditional probability mass function of
given
is
provided
.
If
and
are continuous random vectors then the conditional probability density
function of
given
is
provided
.
In general, the conditional distribution function of
given
is
As we have explained above, the joint distribution of
and
can be used to derive the marginal distribution of
and the conditional distribution of
given
.
This process can also go in the reverse direction: if we know the marginal
distribution of
and the conditional distribution of
given
,
then we can derive the joint distribution of
and
.
For discrete random variables, we have
that
For continuous random variables, we have
that
Below you can find some exercises with explained solutions.
Let
be a discrete random vector with
support
and
joint probability mass
function
Compute the conditional probability mass function of
given
.
The marginal probability mass function of
evaluated at
is
The
support of
is
Thus,
the conditional probability mass function of
given
is
Let
be a continuous random vector with
support
and
its joint probability density function
be
Compute the conditional probability density function of
given
.
The support of
is
When
,
the marginal probability density function of
is
;
when
,
the marginal probability density function of
is
Thus,
the marginal probability density function of
is
When
evaluated at the point
,
it
becomes
The
support of
is
Thus,
the conditional probability density function of
given
is
Let
be a continuous random variable with
support
and
probability density
function
Let
be another continuous random variable with
support
and
conditional probability density
function
Find the marginal probability density function of
.
The support of the vector
is
and
the joint probability function of
and
is
The
marginal probability density function of
is obtained by marginalization, integrating
out of the joint probability density function
Thus,
for
we trivially have
(because
),
while for
we
have
Thus,
the marginal probability density function of
is
Please cite as:
Taboga, Marco (2021). "Conditional probability distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/conditional-probability-distributions.
Most of the learning materials found on this website are now available in a traditional textbook format.