Search for probability and statistics terms on Statlect

Statistical inference

by , PhD

Statistical inference is the act of using observed data to infer unknown properties and characteristics of the probability distribution from which the data have been extracted.

Table of Contents


In the simplest possible case of statistical inference, we observe the realizations [eq1] of some independent random variables [eq2] all having the same distribution.

We then use the observed realizations to infer some characteristics of the distribution.

Example The lifetime of a certain type of electronic device is a random variable X, whose probability distribution is unknown. Suppose that we independently observe the lifetimes of $10$ components. Denote these realizations by $x_{1} $, $x_{2}$, ..., $x_{10}$. We are interested in the expected value of X, which is an unknown characteristic of its distribution. We use the sample mean[eq3]as our estimate (best guess) of the expected value. In this simple example, the sample $x_{1}$, $x_{2}$, ..., $x_{10}$ is used to make a statistical inference about a characteristic (the expected value) of the distribution that generated the sample (the probability distribution of X).

Fundamental elements

The previous example shows that three fundamental elements are required to make a statistical inference:

In the next sections we define these three fundamental elements in a mathematically meaningful way.

The sample

In the above example of statistical inference, X_1, ..., X_n are independent random variables. However, more complicated cases are possible. For instance,

  1. X_1, ..., X_n are not independent;

  2. X_1, ..., X_n are random vectors having a common probability distribution;

  3. X_1, ..., X_n do not have a common probability distribution.

Is there a definition of sample that generalizes all of the above special cases?


The definition is extremely simple.

Definition A sample $xi $ is the realization of a random vector $Xi $.

As we will see in the following examples, $xi $ is a vector that collects the observed data.

The vector $xi $ is considered a realization of a random vector $Xi $.

The object of the statistical inferences is the probability distribution of $Xi $.


The following examples show how this general definition accommodates the special cases mentioned above.

Note that, from now on, in order to be more precise, we will use the term distribution function instead of generically speaking of probability distribution.

Example We observe the$ $realizations $x_{1}$, ..., $x_{n}$ of some independent random variables X_1, ..., X_n having a common distribution function [eq4]. The sample is the n-dimensional vector [eq5] which is a realization of the random vector [eq6] Since the observations are independent, the joint distribution function of the vector $Xi $ is equal to the product of the marginal distributions of its entries:[eq7]

The distribution function of $Xi $, denoted by [eq8], is the unknown distribution function that constitutes the object of inference.

Example If we take the previous example and drop the assumption of independence, the sample $xi $ and the vector $Xi $ are still defined in the same way, but the joint distribution function [eq9] can no longer be written as the product of the distribution functions of X_1, ..., X_n.

In the next example the single observations are no longer scalars, but they are vectors.

Example If X_1, ..., X_n are independent K-dimensional random vectors having a common joint distribution function [eq10], then the sample $xi $ and the vector $Xi $ are $nK$-dimensional. We can still write the joint distribution function of $Xi $ as[eq11]

In the following example we relax the assumption that all the observations come from a unique distribution.

Example If the K-dimensional random vectors X_1, ..., X_n have different joint distribution functions [eq12], ..., [eq13], then $xi $ and $Xi $ are defined as before, but the joint distribution function of $Xi $ is[eq14]

Sample size

When the sample is made of the realizations $x_{1}$, ..., $x_{n}$ of n random variables (or vectors), then we say that the sample size is n.

An individual realization $x_{i}$ is referred to as an observation from the sample.

Statistical model

We now shift our attention to the probability distribution that generates the sample, which is another one of the fundamental elements of a statistical inference problem.

In the previous section we have defined a sample $xi $ as a realization of a random vector $Xi $ having joint distribution function [eq15].

The sample $xi $ is used to infer some characteristics of [eq16] that are not fully known by the statistician.

The properties and the characteristics of [eq17] that are already known (or are assumed to be known) before observing the sample are called a model for $Xi $.


In mathematical terms, a model for $Xi $ is a set of joint distribution functions to which [eq17] is assumed to belong.

Definition Let the sample $xi $ be a realization of an $l$-dimensional random vector $Xi $ having joint distribution function [eq17]. Let $Psi $ be the set of all $l$-dimensional joint distribution functions:[eq20]A subset [eq21] is called a statistical model (or a model specification or, simply, a model) for $Xi $.

In this definition,

The smaller set $Phi $ is called a statistical model.


The following examples are a continuation of the examples made in the previous section.

Example Suppose that our sample is made of the realizations $x_{1}$, ..., $x_{n}$ of the random variables X_1, ..., X_n. Assume that the n random variables are mutually independent and that they have a common distribution function [eq22] The sample is the n-dimensional vector [eq5] $Psi $ is the set of all possible distribution functions of the random vector [eq24]. Recalling the definition of marginal distribution function and the characterization of mutual independence, we can define the statistical model $Phi $ as follows:[eq25]

Example Take the example above and drop the assumption that the random variables are mutually independent. The statistical model $Phi $ is now:[eq26]

Mis-specified models

If $F_{Xi }in Phi $, the model is said to be correctly specified (or well-specified).

Otherwise, if [eq27] the model is said to be mis-specified.

Parametric model

A model $Phi $ for $Xi $ is called a parametric model if the joint distribution functions belonging to $Phi $ are put into correspondence with a set $Theta $ of real vectors.

Definition Let $Phi $ be a model for $Xi $. Let [eq28] be a set of p-dimensional real vectors. Let [eq29] be a correspondence that associates a subset of $Phi $ to each $	heta in Theta $. The triple [eq30] is a parametric model if and only if[eq31]The set $Theta $ is called parameter space. A vector $	heta in Theta $ is called a parameter.

Therefore, in a parametric model every element of $Phi $ is put into correspondence with at least one parameter $	heta $.

Parametric families

When [eq32] associates to each parameter a unique joint distribution function (i.e., when [eq33] is a function) the parametric model is called a parametric family.

Definition Let [eq34] be a parametric model. If $gamma $ is a function from $Theta $ to $Phi $, then the parametric model is called a parametric family. In this case, the joint distribution function associated to a parameter $	heta $ is denoted by [eq35].

Here is a classical example of a parametric family.

Example Suppose that $Xi $ is assumed to have a multivariate normal distribution. Then, the model $Phi $ is the set of all multivariate normal distributions, which are completely described by two parameters (the mean vector mu and the covariance matrix Sigma). Each parameter [eq36] is associated to a unique distribution function in the set $Phi $. Therefore, we have a parametric family.

When each distribution function is associated with only one parameter, the parametric family is said to be identifiable.

Definition Let [eq37] be a parametric family. If $gamma $ is one-to-one (i.e., each distribution function F is associated with only one parameter), then the parametric family is said to be identifiable.

The set of multivariate normal distributions in the previous example is also an identifiable parametric family because each distribution is associated to a unique parameter.

Inferences about the data-generating distribution

A statistical inference is a statement about the unknown distribution function [eq17], based on the observed sample $xi $ and the statistical model $Phi $.

Types of statistical inference

The following are common kinds of statistical inferences.

  1. Hypothesis testing: we make an hypothesis about some feature of the distribution [eq39] and we use the data to decide whether to reject or do not reject the hypothesis;

  2. Point estimation: we use the data to estimate the value of a parameter of the data-generating distribution [eq17];

  3. Bayesian inference: we use the observed sample $xi $ to update prior probabilities assigned to the possible data-generating distributions.

Model restrictions

Often, we make statistical inferences about model restrictions.

Given a subset of the original model [eq41], a model restriction can be either an inclusion restriction:[eq42]or an exclusion restriction:[eq43]

Decision theory

The choice of the statement (the statistical inference) to make based on the observed data can often be formalized as a decision problem where:

  1. making a statistical inference is regarded as an action;

  2. each action can have different consequences, depending on which distribution function [eq17] is the true one;

  3. a preference ordering over possible consequences needs to be elicited;

  4. an optimal course of action needs to be taken, coherently with elicited preferences.

There are several different ways of formalizing such a decision problem. The branch of statistics that analyzes these decision problems is called statistical decision theory.

Keep reading

In this lecture we have touched on several important topics. You can read more about these topics in the following pages:

How to cite

Please cite as:

Taboga, Marco (2021). "Statistical inference", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.