Search for probability and statistics terms on Statlect
StatLect

Expected value and the Lebesgue integral

by , PhD

The Lebesgue integral is used to give a completely general definition of expected value. This lecture introduces the Lebesgue integral, first in an intuitive manner and then in a more rigorous manner.

Table of Contents

The Lebesgue Integral - Intuition

Let us recall the informal definition of expected value we have given in the lecure entitled Expected Value:

Definition The expected value of a random variable X is the weighted average of the values that X can take on, where each possible value is weighted by its respective probability.

When X is discrete and can take on only finitely many values, it is straightforward to compute the expected value of X, by just applying the above definition. Denote by $x_{1}$, ..., $x_{n}$ the n values that X can take on (the n elements of its support) and define the following events:[eq1]i.e. when the event $E_{i}$ happens, then X equals $x_{i}$.

We can write the expected value of X as[eq2]i.e. the expected value of X is the weighted average of the values that X can take on ($x_{1}$, ..., $x_{n}$), where each possible value $x_{i}$ is weighted by its respective probability [eq3].

Note that this is a way of expressing the expected value that uses neither [eq4], the distribution function of X, nor its probability mass function [eq5]. Instead, the above way of expressing the expected value uses only the probability [eq6] defined on the events $Esubseteq Omega $. In many applications, it turns out that this is a very convenient way of expressing (and calculating) the expected value: for example, when the distribution function [eq7] is not directly known and it is difficult to derive, it is sometimes easier to directly compute the probabilities [eq6] defined on the events $Esubseteq Omega $. Below, this will be illustrated with an example.

When X is discrete, but can take on infinitely many values, in a similar fashion we can write[eq9]

In this case, however, there is a possibility that [eq10] is not well-defined: this happens when the infinite series above does not converge, that is, when the limit[eq11]does not exist. In the next section we will show how to take care of this possibility.

In the case in which X in not discrete (its support has the power of the continuum), things are much more complicated. In this case, the above summation does not make any sense (the support of X cannot be arranged into a sequence and so there is no sequence over which we can sum). Thus, we have to find a workaround. The workaround is similar to the one we have discussed in the presentation of the Stieltjes integral: we build a simpler random variable Y that is a good approximation of X and whose expected value can easily be computed; then we make the approximation better and better; finally, we define the expected value of X to be equal to the expected value of Y when the approximation tends to become perfect.

How does the approximation work, intuitively? We illustrate it in three steps:

  1. in the first step, we partition the sample space Omega into n events $E_{1}$, ..., $E_{n}$, such that [eq12] for $i
eq j$ and[eq13]

  2. in the second step we find, for each event $E_{i}$, the smallest value that X can take on when the event $E_{i}$ happens:[eq14]

  3. in the third step, we define the random variable Y (which approximates X) as follows:[eq15]

In this way, we have built a random variable Y such that [eq16] for any omega. The finer the partition $E_{1}$, ..., $E_{n}$ is, the better the approximation is: intuitively, when the sets $E_{i}$ become smaller, then $y_{i}$ becomes closer to the values that X takes on when $E_{i}$ happens.

The expected value of Y is, of course, easy to compute:[eq17]

The expected value of X is defined as follows:[eq18]where the notation $Y
ightarrow X$ means that Y becomes a better approximation of X (because the partition $E_{1}$, ..., $E_{n}$ is made finer).

Several equivalent integral notations are used to denote the above limit:[eq19]and the integral is called the Lebesgue integral of X with respect to the probability measure $QTR{rm}{P}$. The notation $dQTR{rm}{P}$ (or $domega $) indicates that the sets $E_{i}$ become very small by improving the approximation (making the partition $E_{1}$, ..., $E_{n}$ finer); the integral notation $int $ can be thought of as a shorthand for [eq20]; X appears in place of Y in the integral, because the two tend to coincide when the approximation becomes better and better.

Linearity of the Lebesgue integral

An important property enjoyed by the Lebesgue integral is linearity.

Proposition Let X_1 and X_2 be two random variables and let $c_{1}in U{211d} $ and $c_{2}in U{211d} $ be two constants. Then,[eq21]

The next example shows an important application of the linearity of the Lebesgue integral. The example also shows how the Lebesgue integral can, in certain situations, be much simpler to use than the Stieltjes integral when computing the expected value of a random variable.

Example Let X_1 and X_2 be two random variables. We want to define (and compute) the expected value of the sum $X_{1}+X_{2}$. Define a new random variable Z:[eq22]Using the Stieltjes integral, the expected value is defined as follows:[eq23]where [eq24] is the distribution function of Z. Hence, to compute the above integral, we first need to know the distribution function of Z (which might be extremely difficult to derive). By using the Lebesgue integral, the expected value is defined as follows:[eq25]However, by linearity of the Lebesgue integral, we obtain[eq26]Thus, to compute the expected value of $Z=X_{1}+X_{2}$, we do not need to know the distribution function of Z, but we only need to know the expected values of X_1 and X_2.

The example thus shows that linearity of the Lebesgue integral trivially translates into linearity of the expected value.

Proposition Let X_1 and X_2 be two random variables and let $c_{1}in U{211d} $ and $c_{2}in U{211d} $ be two constants. Then,[eq27]

The Lebesgue integral - A more rigorous definition

A more rigorous definition of the Lebesgue integral requires that we introduce the notion of a simple random variable. A random variable Y is called simple if it takes on finitely many positive values, that is, there exist n events $E_{1}$, ..., $E_{n}$ such that [eq28] for $i
eq j$ and [eq29] and[eq30]furthermore $y_{i}geq 0$ for all i.

Note that a simple random variable is also a discrete random variable. Hence, the expected value of a simple random variable is easy to compute (it is just the weighted sum of the elements of its support).

The Lebesgue integral of a simple random variable Y is defined to be equal to its expected value:[eq31]

Let X be the random variable whose integral we want to compute. Let $X^{+}$ and $X^{-}$ be the positive and negative part of X respectively:[eq32]Note that [eq33], [eq34] for any omega and[eq35]

The Lebesgue integral of $X^{+}$ is defined as follows:[eq36]In words, the Lebesgue integral of $X^{+}$ is obtained by taking the supremum over the Lebesgue integrals of all the simple functions Y that are less than $X^{+}$.

The Lebesgue integral of $X^{-}$ is defined as follows:[eq37]Finally, the Lebesgue integral of X is defined as the difference between the integrals of its positive and negative parts:[eq38]provided the difference makes sense; in case both [eq39] and [eq40] are both equal to infinity, then the difference is not well-defined and we say that X is not integrable.

How to cite

Please cite as:

Taboga, Marco (2021). "Expected value and the Lebesgue integral", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/expected-value-and-Lebesgue-integral.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.