StatlectThe Digital Textbook
Index > Fundamentals of probability

Conditional probability distributions

To understand conditional probability distributions, you need to be familiar with the concept of conditional probability, which has been introduced in the lecture entitled Conditional probability.

We discuss here how to update the probability distribution of a random variable X after observing the realization of another random variable Y, i.e., after receiving the information that another random variable Y has taken a specific value $y$. The updated probability distribution of X will be called the conditional probability distribution of X given Y=y.

The two random variables X and Y, considered together, form a random vector [eq1]. Depending on the characteristics of the random vector [eq2], different procedures need to be adopted in order to compute the conditional probability distribution of X given Y=y. In the remainder of this lecture, these procedures are presented in the following order:

  1. first, we tackle the case in which the random vector [eq3] is a discrete random vector;

  2. then, we tackle the case in which [eq2] is an absolutely continuous random vector;

  3. finally, we briefly discuss the case in which [eq5] is neither discrete nor absolutely continuous.

Important: note that if we are able to update the probability distribution of X when we observe the realization of Y (i.e., when we receive the information that Y=y), then we are also able to update the probability distribution of X when we receive the information that a generic event E has happened: it suffices to set $Y=1_{E}$, where $1_{E}$ is the indicator function of the event E, and update the distribution of X based on the information $Y=1_{E}=1$.

Discrete random vectors - Conditional probability mass function

In the case in which [eq2] is a discrete random vector (as a consequence X is a discrete random variable), the probability mass function of X conditional on the information that Y=y is called conditional probability mass function.

Definition Let [eq7] be a discrete random vector. We say that a function [eq8] is the conditional probability mass function of X given Y=y if, for any $xin U{211d} $,[eq9]where [eq10] is the conditional probability that $X=x$ given that Y=y.

How do we derive the conditional probability mass function from the joint probability mass function [eq11]? The following proposition provides an answer to this question.

Proposition Let [eq12] be a discrete random vector. Let [eq13] be its joint probability mass function, and [eq14] the marginal probability mass function of Y. The conditional probability mass function of X given $Y=y $ is[eq15]provided [eq16].

Proof

This is just the usual formula for computing conditional probabilities (conditional probability equals joint probability divided by marginal probability): [eq17]

Note that the above proposition assumes knowledge of the marginal probability mass function [eq14], which can be derived from the joint probability mass function [eq13] by marginalization (go here if you do not remember how).

Example Let the support of [eq20] be [eq21]and its joint probability mass function be[eq22]Let us compute the conditional probability mass function of X given $Y=0$. The support of Y is[eq23]The marginal probability mass function of Y evaluated at $y=0$ is[eq24]The support of X is[eq25]Thus, the conditional probability mass function of X given $Y=0$ is[eq26]

In the case in which [eq27], there is, in general, no way to unambiguously derive the conditional probability mass function of X, as we will show below with an example. The impossibility of deriving the conditional probability mass function unambiguously in this case (called by some authors the Borel-Kolmogorov paradox) is not particularly worrying, as this case is seldom relevant in applications. The following is an example of a case in which the conditional probability mass function cannot be derived unambiguously (the example is a bit involved; the reader might safely skip it on a first reading).

Example Suppose we are given the following sample space:[eq28]i.e. the sample space Omega is the set of all real numbers between 0 and 1. It is possible to build a probability measure $QTR{rm}{P}$ on Omega, such that $QTR{rm}{P}$ assigns to each sub-interval of $left[ 0,1
ight] $ a probability equal to its length, that is,[eq29]This is the same sample space discussed in the lecture on zero-probability events. Define a random variable X as follows:[eq30]and another random variable Y as follows:[eq31]Both X and Y are discrete random variables and, considered together, they constitute a discrete random vector [eq32]. Suppose we want to compute the conditional probability mass function of X conditional on $Y=1$. It is easy to see that [eq33]. As a consequence, we cannot use the formula[eq34]because division by zero is not possible. It turns out that also the technique of implicitly deriving a conditional probability as a realization of a random variable satisfying the definition of a conditional probability with respect to a partition (see the lecture entitled Conditional probability as a random variable) does not allow to unambiguously derive [eq35]. In this case, the partition of interest is [eq36], where[eq37]and [eq38] can be viewed as the realization of the conditional probability [eq39] when $omega in G_{1}$. The fundamental property of conditional probability [eq40] is satisfied in this case if and only if, for a given x, the following system of equations is satisfied:[eq41]which implies[eq42]The second equation does not help determining [eq43]. So, from the first equation it is evident that [eq44] is undetermined (any number, when multiplied by zero, gives zero). One can show that also the requirement that [eq45] be a regular conditional probability does not help to pin down [eq38]. What does it mean that [eq38] is undetermined? It means that any choice of [eq35] is legitimate, provided the requirement [eq49] is satisfied. Is this really a paradox? No, because conditional probability with respect to a partition is defined up to almost sure equality, $G_{1}$ is a zero-probability event, so the value that [eq50] takes on $G_{1}$ does not matter (roughly speaking, we do not really need to care about zero-probability events, provided there is only a countable number of them).

Absolutely continuous random vectors - Conditional probability density function

In the case in which [eq2] is an absolutely continuous random vector (as a consequence X is an absolutely continuous random variable), the probability density function of X conditional on the information that Y=y is called conditional probability density function.

Definition Let [eq52] be an absolutely continuous random vector. We say that a function [eq53] is the conditional probability density function of X given Y=y if, for any interval [eq54],[eq55]and [eq56] is such that the above integral is well defined.

How do we derive the conditional probability mass function from the joint probability density function [eq57]?

Deriving the conditional distribution of X given Y=y is far from obvious: whatever value of $y$ we choose, we are conditioning on a zero-probability event ([eq58] - see here for an explanation); therefore, the standard formula (conditional probability equals joint probability divided by marginal probability) cannot be used. However, it turns out that the definition of conditional probability with respect to a partition can be fruitfully applied in this case to derive the conditional probability density function of X given Y=y:

Proposition Let [eq59] be an absolutely continuous random vector. Let [eq57] be its joint probability density function, and [eq61] be the marginal probability density function of Y. The conditional probability density function of X given Y=y is[eq62]provided [eq63].

Proof

To prove that[eq64]is a legitimate choice, we need to prove that conditional probabilities calculated by using the above conditional density function satisfy the fundamental property of conditional probability:[eq65]for any H and E. Thanks to some basic results in measure theory, we can confine our attention to the events H and E that can be written as follows:[eq66]For these events, it is immediate to verify that the fundamental property of conditional probability holds. First, by the very definition of a conditional probability density function, we have that[eq67]Furthermore, [eq68] is also a function of Y. Therefore, the product [eq69] is a function of Y , so we can use the transformation theorem to compute its expected value: [eq70]The last equality proves the proposition.

Example Let the support of [eq20] be [eq72]and its joint probability density function be[eq73]Let us compute the conditional probability density function of X given $Y=1 $. The support of Y is[eq74]When [eq75], the marginal probability density function of Y is 0; when [eq76], the marginal probability density function is[eq77]Thus, the marginal probability density function of Y is[eq78]When evaluated at $y=1$, it is[eq79]The support of X is[eq80]Thus, the conditional probability density function of X given $Y=1$ is[eq81]

The general case - Conditional distribution function

In general, when [eq2] is neither discrete nor absolutely continuous, we can characterize the distribution function of X conditional on the information that Y=y. We define the conditional distribution function of X given Y=y as follows.

Definition We say that a function [eq83] is the conditional distribution function of X given Y=y if and only if[eq84]where [eq85] is the conditional probability that $Xleq x$ given that Y=y.

There is no immediate way of deriving the conditional distribution of X given Y=y. However, we can characterize it using the concept of conditional probability with respect to a partition, as follows.

Define the events $G_{y}$ as follows:[eq86]and a partition G of events as[eq87]where, as usual, $R_{Y}$ is the support of Y.

Then, for any $omega in G_{y}$ we have[eq88]where [eq89] is the probability that $Xleq x$ conditional on the partition G. As we know, [eq89] is guaranteed to exist and is unique up to almost sure equality. Of course, this does not mean that we are able to compute it. Nonetheless, this characterization is extremely useful, because it allows us to speak of the conditional distribution of X given Y=y in general, without a need to specify whether X and Y are discrete or continuous.

More details

The following sections contain more details about conditional distributions.

Conditional distribution of a random vector

We have discussed how to update the probability distribution of a random variable X after observing the realization of another random variable Y, that is, after receiving the information that Y=y. What happens when X and Y are random vectors rather than random variables? Basically, everything we said above still applies with straightforward modifications.

Thus, if X and Y are discrete random vectors, then the conditional probability mass function of X given Y=y is[eq91]provided [eq92].

If X and Y are absolutely continuous random vectors then the conditional probability density function of X given Y=y is[eq64] provided [eq94]. In general, the conditional distribution function of X given Y=y is[eq95]

The joint distribution as a product of marginal and conditional

As we have explained above, the joint distribution of X and Y can be used to derive the marginal distribution of Y and the conditional distribution of X given Y=y. This process can also go in the reverse direction: if we know the marginal distribution of Y and the conditional distribution of X given Y=y, then we can derive the joint distribution of X and Y. For discrete random variables, we have that[eq96]For absolutely continuous random variables, we have that[eq97]

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let [eq2] be a discrete random vector with support[eq99]and joint probability mass function[eq100]Compute the conditional probability mass function of X given $Y=0$.

Solution

The marginal probability mass function of Y evaluated at $y=0$ is[eq101]The support of X is[eq102]Thus, the conditional probability mass function of X given $Y=0$ is[eq103]

Exercise 2

Let [eq2] be an absolutely continuous random vector with support[eq105]and its joint probability density function be[eq106]Compute the conditional probability density function of X given $Y=2$.

Solution

The support of Y is[eq107]When [eq108], the marginal probability density function of Y is [eq109]; when [eq110], the marginal probability density function of Y is[eq111]Thus, the marginal probability density function of Y is[eq112]When evaluated at the point $y=2$, it becomes[eq113]The support of X is[eq114]Thus, the conditional probability density function of X given $Y=2$ is[eq115]

Exercise 3

Let X be an absolutely continuous random variable with support[eq116]and probability density function[eq117]Let Y be another absolutely continuous random variable with support[eq118]and conditional probability density function[eq119]Find the marginal probability density function function of Y.

Solution

The support of the vector [eq2] is[eq121]and the joint probability function of X and Y is[eq122]The marginal probability density function of Y is obtained by marginalization, integrating x out of the joint probability density function

[eq123]Thus, for [eq124] we trivially have [eq125] (because [eq126]), while for [eq127] we have[eq128]Thus, the marginal probability density function of Y is[eq129]

The book

Most learning materials found on this website are now available in a traditional textbook format.