A random variable is a variable whose value depends on the outcome of a probabilistic experiment. Its value is a priori unknown, but it becomes known once the outcome of the experiment is realized.
Table of contents
Denote by
the set of all possible outcomes of a probabilistic experiment, called a
sample space.
A random variable associates a real number to each element of
,
as stated by the following definition.
In rigorous (measure-theoretic) probability theory, the function
is also required to be measurable (see a more rigorous
definition of random variable).
The real number
associated to a sample point
is called a
realization of the
random variable.
The set of all possible realizations is called
support and is denoted
by
.
This example shows how the realizations of a random variable are associated with the outcomes of a probabilistic experiment.
Suppose that we flip a coin. The possible outcomes are either tail
()
or head
(
),
that
is,
The two outcomes are assigned equal
probabilities:
If tail
()
is the outcome, we win one dollar, if head
(
)
is the outcome we lose one dollar.
The amount
we win (or lose) is a random variable, defined as
follows:
The probability of winning one dollar
is
The probability of losing one dollar
is
The probability of losing two dollars
is
Some remarks on notation are in order:
The dependence of
on
is often omitted, that is, we simply write
instead of
.
If
,
the exact meaning of the notation
is the
following:
If
,
we sometimes use the notation
with the following
meaning:
In
this case,
is to be interpreted as a probability measure on the set of real numbers,
induced by the random variable
.
Often, statisticians construct probabilistic models where a random variable
is defined by directly specifying
,
without specifying the sample space
.
Most of the time, statisticians deal with two special kinds of random variables:
discrete random variables;
continuous random variables.
These two types are described in the next sections.
Here is the first kind.
Definition
A random variable
is discrete if
its support
is a countable set;
there is a function
,
called the probability mass
function (or pmf or probability function) of
,
such that, for any
:
The following is an example of a discrete random variable.
Example
A Bernoulli random variable is an
example of a discrete random variable. It can take only two values:
with probability
and
with probability
,
where
.
Its support is
.
Its probability mass function
is
Probability mass functions are characterized by two fundamental properties.
Non-negativity:
for any
;
Sum over the support equals
:
.
Any probability mass function must satisfy these two properties.
Moreover, any function satisfying these two properties is a legitimate probability mass function.
These and other properties of probability mass functions are discussed more in detail in the lecture on Legitimate probability mass functions.
Continuous variables are defined as follows.
Definition
A random variable
is
continuous
(or absolutely continuous) if and only if
its support
is not countable;
there is a function
,
called the probability density function (or pdf or density function) of
,
such that, for any interval
:
The page on the probability density function explains why we need integrals to deal with continuous variables.
We now illustrate the definition with an example.
Example
A uniform random variable (on the interval
)
is an example of a continuous variable. It can take any value in the interval
.
All sub-intervals of equal length are equally likely. Its support is
.
Its probability density function
is
The
probability that the realization of
belongs, for example, to the interval
is
Probability density functions are characterized by two fundamental properties:
Non-negativity:
for any
;
Integral over
equals
:
.
Any probability density function must satisfy these two properties.
Moreover, any function satisfying these two properties is a legitimate probability density function.
The lecture on Legitimate probability density functions contains a detailed discussion of these facts.
Random variables, also those that are neither discrete nor continuous, are often characterized in terms of their distribution function.
Definition
Let
be a random variable. The distribution function (or cumulative distribution
function or cdf ) of
is a function
such
that
If we know the distribution function of a random variable
,
then we can easily compute the probability that
belongs to an interval
as
Want to learn more about the cdf? Check here.
In the following subsections you can find more details on random variables and univariate probability distributions.
Note that, if
is continuous,
then
Hence, by taking the derivative with respect to
of both sides of the above equation, we
obtain
Note that, if
is a continuous random variable, the probability that
takes on any specific value
is equal to
zero:
Thus, the event
is a zero-probability event for any
.
The lecture on Zero-probability events
contains a thorough discussion of this apparently paradoxical fact: although
it can happen that
,
the event
has zero probability of happening.
Random variables can be defined in a more rigorous manner by using the terminology of measure theory, and in particular the concepts of sigma-algebra, measurable set and probability space introduced at the end of the lecture on probability.
Definition
Let
be a probability space, where
is a sample space,
is a sigma-algebra of events (subsets of
)
and
is a probability measure on
.
Let
be the Borel sigma-algebra of the set of real numbers
(i.e., the smallest sigma-algebra containing all the open subsets of
).
A function
such that
for
any
is said to be a random variable on
.
This definition ensures that the probability that the realization of the
random variable
will belong to a set
can be defined as
where
the probability on the right-hand side is well defined because the set
is measurable.
One question remains to be answered: why did we introduce the exotic concept of Borel sigma-algebra?
Clearly, if we want to assign probabilities to subsets of
(to which the realizations of the random variable
could belong), then we need to define a sigma-algebra of subsets of
(remember that we need a sigma-algebra in order to define probability
rigorously).
But why can't we consider the simpler to understand set of all possible
subsets of
,
which is a sigma-algebra?
The short answer is that we are not able to define a probability measure on
sigma-algebras larger (i.e., containing more subsets of
)
than the Borel sigma-algebra: whenever we try to do so, we end up finding some
uncountable sets for which the sigma-additivity property of probability does
not hold (i.e., their probability is different from the sum of the
probabilities of their parts) or such that their probability is not equal to
one minus the probability of their complements.
Below you can find some exercises with explained solutions.
Let
be a discrete random variable. Let its support
be
Let its probability mass function
be
Calculate the following
probability:
By the additivity of probability, we have
that
Let
be a discrete random variable. Let its support
be the set of the first
natural
numbers:
Let its probability mass function
be
Compute the
probability
By using the additivity of probability,
we
obtain
Let
be a discrete random variable. Let its support
be
Let its probability mass function
be
where
is the binomial coefficient.
Calculate the
probability
First note that, by
additivity:
Therefore, in order to compute
,
we need to evaluate the probability mass function at the three points
,
and
:
Finally,
Let
be a continuous random variable. Let its support
be
Let its probability density function
be
Compute
The probability that a continuous
variable takes a value in a given interval is equal to the integral of the
probability density function over that
interval:
Let
be a continuous variable. Let its support
be
Let its probability density function
be
Compute
As in the previous exercise, the
probability that
takes a value in a given interval is equal to the integral of its density
function over that
interval:
Let
be a continuous variable. Let its support
be
Let its probability density function
be
where
.
Compute
As in the previous exercise, we need to
compute an
integral:
Looking for more exercises? Try StatLect's probability exercises page.
Please cite as:
Taboga, Marco (2021). "Random variable", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/random-variables.
Most of the learning materials found on this website are now available in a traditional textbook format.