Statistical inference is the act of using observed data to infer unknown properties and characteristics of the probability distribution from which the data have been extracted.
In the simplest possible case of statistical inference, we observe the
realizations
of some independent random variables
all having the same distribution.
We then use the observed realizations to infer some characteristics of the distribution.
Example
The lifetime of a certain type of electronic device is a random variable
,
whose probability distribution is unknown. Suppose that we independently
observe the lifetimes of
components. Denote these realizations by
,
,
...,
.
We are interested in the
expected value of
,
which is an unknown characteristic of its distribution. We use the
sample
mean
as
our estimate (best guess) of the expected value. In this simple example, the
sample
,
,
...,
is used to make a statistical inference about a characteristic (the expected
value) of the distribution that generated the sample (the probability
distribution of
).
The previous example shows that three fundamental elements are required to make a statistical inference:
a sample (the observed data);
a probability distribution that generates the data;
a characteristic of the distribution about which inferences are drawn.
In the next sections we define these three fundamental elements in a mathematically meaningful way.
In the above example of statistical inference,
,
...,
are independent random variables. However, more complicated cases are
possible. For instance,
,
...,
are not independent;
,
...,
are random vectors having a common probability
distribution;
,
...,
do not have a common probability distribution.
Is there a definition of sample that generalizes all of the above special cases?
The definition is extremely simple.
Definition
A sample
is the realization of a random vector
.
As we will see in the following examples,
is a vector that collects
the observed data.
The vector
is considered a realization of a random vector
.
The object of the statistical inferences is the probability distribution of
.
The following examples show how this general definition accommodates the special cases mentioned above.
Note that, from now on, in order to be more precise, we will use the term distribution function instead of generically speaking of probability distribution.
Example
We observe
therealizations
,
...,
of some independent random variables
,
...,
having a common distribution function
.
The sample is the
-dimensional
vector
which is a realization of the random vector
Since the observations are independent, the
joint distribution
function of the vector
is equal to the product of the
marginal distributions
of its
entries:
The distribution function of
,
denoted by
,
is the unknown distribution function that constitutes the object of inference.
Example
If we take the previous example and drop the assumption of independence, the
sample
and the vector
are still defined in the same way, but the joint distribution function
can no longer be written as the product of the distribution functions of
,
...,
.
In the next example the single observations are no longer scalars, but they are vectors.
Example
If
,
...,
are independent
-dimensional
random vectors having a common joint distribution function
,
then the sample
and the vector
are
-dimensional.
We can still write the joint distribution function of
as
In the following example we relax the assumption that all the observations come from a unique distribution.
Example
If the
-dimensional
random vectors
,
...,
have different joint distribution functions
,
...,
,
then
and
are defined as before, but the joint distribution function of
is
When the sample is made of the realizations
,
...,
of
random variables (or vectors), then we say that the sample size is
.
An individual realization
is referred to as an observation from the sample.
We now shift our attention to the probability distribution that generates the sample, which is another one of the fundamental elements of a statistical inference problem.
In the previous section we have defined a sample
as a realization of a random vector
having joint distribution function
.
The sample
is used to infer some characteristics of
that are not fully known by the statistician.
The properties and the characteristics of
that are already known (or are assumed to be known) before observing the
sample are called a model for
.
In mathematical terms, a model for
is a set of joint distribution functions to which
is assumed to belong.
Definition
Let the sample
be a realization of an
-dimensional random
vector
having joint distribution function
.
Let
be the set of all
-dimensional
joint distribution
functions:
A
subset
is called a statistical model (or a model specification or,
simply, a model) for
.
In this definition,
the set
is a large set containing all the possible data-generating distributions;
the set
is a smaller subset of data-generating distributions on which we focus our
attention.
The smaller set
is called a statistical model.
The following examples are a continuation of the examples made in the previous section.
Example
Suppose that our sample is made of the realizations
,
...,
of the random variables
,
...,
.
Assume that the
random variables are mutually independent and that they have a common
distribution function
The sample is the
-dimensional
vector
is the set of all possible distribution functions of the random vector
.
Recalling the definition of
marginal distribution
function and the characterization of mutual independence, we can define
the statistical model
as
follows:
Example
Take the example above and drop the assumption that the random variables are
mutually independent. The statistical model
is
now:
If
,
the model is said to be correctly specified (or well-specified).
Otherwise, if
the model is said to be mis-specified.
A model
for
is called a parametric model if the joint distribution functions belonging to
are put into correspondence with a set
of real vectors.
Definition
Let
be a model for
.
Let
be a set of
-dimensional
real vectors. Let
be a correspondence that associates a subset of
to each
.
The triple
is a parametric model if and only
if
The
set
is called parameter space. A vector
is called a parameter.
Therefore, in a parametric model every element of
is put into correspondence with at least one parameter
.
When
associates to each parameter a unique joint distribution function (i.e., when
is a function) the parametric model is called a parametric family.
Definition
Let
be a parametric model. If
is a function from
to
,
then the parametric model is called a parametric family. In
this case, the joint distribution function associated to a parameter
is denoted by
.
Here is a classical example of a parametric family.
Example
Suppose that
is assumed to have a
multivariate
normal distribution. Then, the model
is the set of all multivariate normal distributions, which are completely
described by two parameters (the mean vector
and the covariance matrix
).
Each parameter
is associated to a unique distribution function in the set
.
Therefore, we have a parametric family.
When each distribution function is associated with only one parameter, the parametric family is said to be identifiable.
Definition
Let
be a parametric family. If
is one-to-one (i.e., each distribution function
is associated with only one parameter), then the parametric family is said to
be identifiable.
The set of multivariate normal distributions in the previous example is also an identifiable parametric family because each distribution is associated to a unique parameter.
A statistical inference is a statement about the unknown
distribution function
,
based on the observed sample
and the statistical model
.
The following are common kinds of statistical inferences.
Hypothesis
testing: we make an hypothesis about some feature of the distribution
and we use the data to decide whether to reject or do not reject the
hypothesis;
Point estimation:
we use the data to estimate the value of a parameter of the data-generating
distribution
;
Bayesian
inference: we use the observed sample
to update prior probabilities
assigned to the possible data-generating distributions.
Often, we make statistical inferences about model restrictions.
Given a subset of the original model
,
a model restriction can be either an inclusion
restriction:
or
an exclusion
restriction:
The choice of the statement (the statistical inference) to make based on the observed data can often be formalized as a decision problem where:
making a statistical inference is regarded as an action;
each action can have different consequences, depending on which distribution
function
is the true one;
a preference ordering over possible consequences needs to be elicited;
an optimal course of action needs to be taken, coherently with elicited preferences.
There are several different ways of formalizing such a decision problem. The branch of statistics that analyzes these decision problems is called statistical decision theory.
In this lecture we have touched on several important topics. You can read more about these topics in the following pages:
Please cite as:
Taboga, Marco (2021). "Statistical inference", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/statistical-inference.
Most of the learning materials found on this website are now available in a traditional textbook format.