Expected value and the Lebesgue integral

The Lebesgue integral is used to give a completely general definition of expected value. This lecture introduces the Lebesgue integral, first in an intuitive manner and then in a more rigorous manner.

Table of contents

The Lebesgue Integral - Intuition
Linearity of the Lebesgue integral
The Lebesgue integral - A more rigorous definition

The Lebesgue Integral - Intuition

Let us recall the informal definition of expected value we have given in the lecure entitled Expected Value:

Definition The expected value of a random variable is the weighted average of the values that can take on, where each possible value is weighted by its respective probability.

When is discrete and can take on only finitely many values, it is straightforward to compute the expected value of , by just applying the above definition. Denote by $x_{1}$ , ..., $x_{n}$ the values that can take on (the elements of its support) and define the following events: [eq1] i.e. when the event $E_{i}$ happens, then equals $x_{i}$ .

We can write the expected value of as [eq2] i.e. the expected value of is the weighted average of the values that can take on ( $x_{1}$ , ..., $x_{n}$ ), where each possible value $x_{i}$ is weighted by its respective probability .

Note that this is a way of expressing the expected value that uses neither , the distribution function of , nor its probability mass function . Instead, the above way of expressing the expected value uses only the probability defined on the events . In many applications, it turns out that this is a very convenient way of expressing (and calculating) the expected value: for example, when the distribution function is not directly known and it is difficult to derive, it is sometimes easier to directly compute the probabilities defined on the events . Below, this will be illustrated with an example.

When is discrete, but can take on infinitely many values, in a similar fashion we can write [eq9]

In this case, however, there is a possibility that is not well-defined: this happens when the infinite series above does not converge, that is, when the limit [eq11] does not exist. In the next section we will show how to take care of this possibility.

In the case in which in not discrete (its support has the power of the continuum), things are much more complicated. In this case, the above summation does not make any sense (the support of cannot be arranged into a sequence and so there is no sequence over which we can sum). Thus, we have to find a workaround. The workaround is similar to the one we have discussed in the presentation of the Stieltjes integral: we build a simpler random variable that is a good approximation of and whose expected value can easily be computed; then we make the approximation better and better; finally, we define the expected value of to be equal to the expected value of when the approximation tends to become perfect.

How does the approximation work, intuitively? We illustrate it in three steps:

in the first step, we partition the sample space into events $E_{1}$ , ..., $E_{n}$ , such that for and
in the second step we find, for each event $E_{i}$ , the smallest value that can take on when the event $E_{i}$ happens:
in the third step, we define the random variable (which approximates ) as follows:

In this way, we have built a random variable such that for any . The finer the partition $E_{1}$ , ..., $E_{n}$ is, the better the approximation is: intuitively, when the sets $E_{i}$ become smaller, then $y_{i}$ becomes closer to the values that takes on when $E_{i}$ happens.

The expected value of is, of course, easy to compute: [eq17]

The expected value of is defined as follows: [eq18] where the notation means that becomes a better approximation of (because the partition $E_{1}$ , ..., $E_{n}$ is made finer).

Several equivalent integral notations are used to denote the above limit:and the integral is called the Lebesgue integral of with respect to the probability measure . The notation (or ) indicates that the sets $E_{i}$ become very small by improving the approximation (making the partition $E_{1}$ , ..., $E_{n}$ finer); the integral notation can be thought of as a shorthand for ; appears in place of in the integral, because the two tend to coincide when the approximation becomes better and better.

Linearity of the Lebesgue integral

An important property enjoyed by the Lebesgue integral is linearity.

Proposition Let and be two random variables and let $c_{1}in U{211d}$ and $c_{2}in U{211d}$ be two constants. Then, [eq21]

The next example shows an important application of the linearity of the Lebesgue integral. The example also shows how the Lebesgue integral can, in certain situations, be much simpler to use than the Stieltjes integral when computing the expected value of a random variable.

Example Let and be two random variables. We want to define (and compute) the expected value of the sum $X_{1}+X_{2}$ . Define a new random variable :Using the Stieltjes integral, the expected value is defined as follows: [eq23] where is the distribution function of . Hence, to compute the above integral, we first need to know the distribution function of (which might be extremely difficult to derive). By using the Lebesgue integral, the expected value is defined as follows: [eq25] However, by linearity of the Lebesgue integral, we obtain [eq26] Thus, to compute the expected value of $Z=X_{1}+X_{2}$ , we do not need to know the distribution function of , but we only need to know the expected values of and .

The example thus shows that linearity of the Lebesgue integral trivially translates into linearity of the expected value.

Proposition Let and be two random variables and let $c_{1}in U{211d}$ and $c_{2}in U{211d}$ be two constants. Then,

The Lebesgue integral - A more rigorous definition

A more rigorous definition of the Lebesgue integral requires that we introduce the notion of a simple random variable. A random variable is called simple if it takes on finitely many positive values, that is, there exist events $E_{1}$ , ..., $E_{n}$ such that for and and [eq30] furthermore $y_{i}geq 0$ for all .

Note that a simple random variable is also a discrete random variable. Hence, the expected value of a simple random variable is easy to compute (it is just the weighted sum of the elements of its support).

The Lebesgue integral of a simple random variable is defined to be equal to its expected value: [eq31]

Let be the random variable whose integral we want to compute. Let $X^{+}$ and $X^{-}$ be the positive and negative part of respectively: [eq32] Note that , for any and

The Lebesgue integral of $X^{+}$ is defined as follows:In words, the Lebesgue integral of $X^{+}$ is obtained by taking the supremum over the Lebesgue integrals of all the simple functions that are less than $X^{+}$ .

The Lebesgue integral of $X^{-}$ is defined as follows:Finally, the Lebesgue integral of is defined as the difference between the integrals of its positive and negative parts:provided the difference makes sense; in case both and are both equal to infinity, then the difference is not well-defined and we say that is not integrable.

How to cite

Please cite as:

Taboga, Marco (2021). "Expected value and the Lebesgue integral", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/expected-value-and-Lebesgue-integral.