Conditional probability distribution

A conditional distribution is the probability distribution of a random variable, calculated according to the rules of conditional probability after observing the realization of another random variable.

Table of contents

Overview
Conditioning on events
Discrete random vectors
Continuous random vectors
The general case
More details
1. Conditional distribution of a random vector
2. The joint distribution as a product of marginal and conditional
Solved exercises

Overview

We will discuss how to update the probability distribution of a random variable after receiving the information that another random variable has taken a specific value .

The updated probability distribution of will be called the conditional distribution of given .

The two random variables and , considered together, form a random vector .

Depending on the characteristics of the random vector , different procedures need to be followed in order to compute the conditional probability distribution of given .

In the remainder of this lecture, these procedures are presented in the following order:

first, we tackle the case in which the random vector is a discrete random vector;
then, we tackle the case in which is a continuous random vector;
finally, we briefly discuss the case in which is neither discrete nor continuous.

Conditioning on events

Note that if we are able to update the probability distribution of when we receive the information that , then we can also revise the distribution of when we get to know that a generic event has happened.

It suffices to set $Y=1_{E}$ , where $1_{E}$ is the indicator function of the event , and compute the distribution of conditional on the realization $Y=1_{E}=1$ .

Discrete random vectors

In the case in which is a discrete random vector, the probability mass function (pmf) of conditional on the information that is called conditional probability mass function.

Definition Let be a discrete random vector. We say that a function is the conditional probability mass function of given if, for any ,where is the conditional probability that given that .

How do we derive the conditional pmf from the joint pmf ?

The following proposition provides an answer to this question.

Proposition Let be a discrete random vector. Let be its joint pmf, and the marginal pmf of . The conditional pmf of given is [eq15] provided .

Proof

This is just the usual formula for computing conditional probabilities (conditional probability equals joint probability divided by marginal probability): [eq17]

In the proposition above, we assume that the marginal pmf is known. If it is not, it can be derived from the joint pmf by marginalization.

Example Let the support of be and its joint pmf be [eq22] Let us compute the conditional pmf of given . The support of isThe marginal pmf of evaluated at is [eq24] The support of isThus, the conditional pmf of given is [eq26]

When , it is in general not possible to unambiguously derive the conditional pmf of , as we show below with an example.

This impossibility (known as the Borel-Kolmogorov paradox) is not particularly worrying, as it is seldom relevant in applications.

Example The example is a bit involved. You might safely skip it on a first reading. Suppose that the sample space is the set of all real numbers between and :It is possible to build a probability measure on , such that assigns to each sub-interval of a probability equal to its length, that is,This is the same sample space discussed in the lecture on zero-probability events. Define a random variable as follows: [eq30] and another random variable as follows: [eq31] Both and are discrete random variables and, considered together, they constitute a discrete random vector . Suppose that we want to compute the conditional pmf of conditional on . It is easy to see that . As a consequence, we cannot use the formula [eq34] because division by zero is not possible. Also the technique of deriving a conditional probability implicitly, as a realization of a conditional probability with respect to a sigma-algebra does not allow us to unambiguously derive . In this case, the partition of interest is , where [eq37] and can be viewed as the realization of the conditional probability when $omega in G_{1}$ . The fundamental property of conditional probability is satisfied in this case if and only if, for a given , the following system of equations is satisfied: [eq41] which implies [eq42] The second equation does not help to determine . So, from the first equation, it is evident that is undetermined (any number, when multiplied by zero, gives zero). One can show that also the requirement that be a regular conditional probability does not help to pin down . What does it mean that is undetermined? It means that any choice of is legitimate, provided the requirement is satisfied. Is this really a paradox? No, because conditional probability with respect to a partition is defined up to almost sure equality, and $G_{1}$ is a zero-probability event. As a consequence, the value that takes on $G_{1}$ does not matter. Roughly speaking, we do not really need to care about zero-probability events, provided there is only a countable number of them.

Continuous random vectors

In the case in which is a continuous random vector, the probability density function (pdf) of conditional on the information that is called conditional probability density function.

Definition Let be a continuous random vector. We say that a function is the conditional probability density function of given if, for any interval , [eq55] and is such that the above integral is well defined.

How do we derive the conditional pdf from the joint pdf ?

The following proposition provides an answer to this question.

Proposition Let be a continuous random vector. Let be its joint pdf, and be the marginal pdf of . The conditional pdf of given is [eq61] provided .

Proof

Deriving the conditional distribution of given is far from obvious. As explained in the lecture on random variables, whatever value of we choose, we are conditioning on a zero-probability event: Therefore, the standard formula (conditional probability equals joint probability divided by marginal probability) cannot be used. However, it turns out that the definition of conditional probability with respect to a partition can be fruitfully applied in this case to derive the conditional pdf of given . In order to prove that [eq64] is a legitimate choice, we need to prove that conditional probabilities calculated by using this conditional pdf satisfy the fundamental property of conditional probability:for any and . Thanks to some basic results in measure theory, we can confine our attention to the events and that can be written as follows: [eq66] For these events, it is immediate to verify that the fundamental property of conditional probability holds. First, by the very definition of a conditional pdf, we have that [eq67] Furthermore, the indicator function is also a function of . Therefore, the product is a function of , and we can use the transformation theorem to compute its expected value: [eq70] The last equality proves the proposition.

Example Let the support of be and its joint pdf be [eq73] Let us compute the conditional pdf of given . The support of isWhen , the marginal pdf of is ; when , the marginal pdf is [eq77] Thus, the marginal pdf of is [eq78] When evaluated at , it is [eq79] The support of isThus, the conditional pdf of given is [eq81]

The general case

In general, when is neither discrete nor continuous, we can characterize the distribution function of conditional on the information that .

Definition We say that a function is the conditional distribution function of given if and only ifwhere is the conditional probability that given that .

There is no immediate way of deriving the conditional distribution of given . However, we can characterize it by using the concept of conditional probability with respect to a partition, as follows.

Define the events $G_{y}$ as follows:and a partition of events aswhere, as usual, $R_{Y}$ is the support of .

Then, for any $omega in G_{y}$ we havewhere is the probability that conditional on the partition .

As we know, is guaranteed to exist and is unique up to almost sure equality. Of course, this does not mean that we are able to compute it.

Nonetheless, this characterization is extremely useful because it allows us to speak of the conditional distribution of given in general, without the need to specify whether and are discrete or continuous.

More details

The following sections contain more details about conditional distributions.

Conditional distribution of a random vector

We have discussed how to update the probability distribution of a random variable after observing the realization of another random variable , that is, after receiving the information that .

What happens when and are random vectors rather than random variables?

Basically, everything we said above still applies with straightforward modifications.

Thus, if and are discrete random vectors, then the conditional probability mass function of given is [eq91] provided .

If and are continuous random vectors then the conditional probability density function of given is [eq93] provided .

In general, the conditional distribution function of given is

The joint distribution as a product of marginal and conditional

As we have explained above, the joint distribution of and can be used to derive the marginal distribution of and the conditional distribution of given .

This process can also go in the reverse direction: if we know the marginal distribution of and the conditional distribution of given , then we can derive the joint distribution of and .

For discrete random variables, we have that

For continuous random variables, we have that

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let be a discrete random vector with supportand joint probability mass function [eq100]

Compute the conditional probability mass function of given .

Solution

The marginal probability mass function of evaluated at is [eq101] The support of isThus, the conditional probability mass function of given is [eq103]

Exercise 2

Let be a continuous random vector with supportand its joint probability density function be [eq106]

Compute the conditional probability density function of given .

Solution

The support of isWhen , the marginal probability density function of is ; when , the marginal probability density function of is [eq111] Thus, the marginal probability density function of is [eq112] When evaluated at the point , it becomesThe support of isThus, the conditional probability density function of given is [eq115]

Exercise 3

Let be a continuous random variable with supportand probability density function [eq117]

Let be another continuous random variable with supportand conditional probability density function [eq119]

Find the marginal probability density function of .

Solution

The support of the vector isand the joint probability function of and is [eq122] The marginal probability density function of is obtained by marginalization, integrating out of the joint probability density function

[eq123] Thus, for we trivially have (because ), while for we have [eq128] Thus, the marginal probability density function of is [eq129]

How to cite

Please cite as:

Taboga, Marco (2021). "Conditional probability distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/conditional-probability-distributions.