Search for probability and statistics terms on Statlect

Predictive model

by , PhD

In this lecture we introduce the concept of a predictive model, which lies at the heart of machine learning (ML).

Table of Contents

Observed input-output mapping

To begin with, we observe some outputs [eq1]and the corresponding input vectors [eq2]that may help to predict the outputs before they are observed.


Note: the subscript $t$ used to index the observations is not necessarily time.


We use the observed inputs and outputs to build a predictive model, that is, a function $f$ that takes new inputs [eq3]as arguments and returns the predictions [eq4]of previously unseen outputs[eq5]

Some machine learning jargon

Before diving into predictive modelling, let us learn some machine learning jargon.

Supervised vs unsupervised learning

The problem of learning an input-output mapping is called a supervised learning problem.

The data used for learning is called labelled data and the outputs $y_{t}$ are called labels or targets.

Basically, in a supervised learning problem, the task is to learn the conditional distribution of the outputs given the inputs.

On the contrary, in an unsupervised learning problem, there are no labels and the task is to learn something about the unconditional distribution of $x_{t}$.

The typical example are photos of cats and dogs: $x_{t}$ is a vector of pixel values; in supervised learning, you have labels $y_{t}$ (1 if dog, 0 if cat); in unsupervised learning, you have no labels, but you typically do something like clustering $x_{t}$ in the hope that the algorithm autonomously separates cats from dogs.

Regression vs classification

A supervised learning problem is called:


The inputs are often called features and the vector $x_{t}$ is called a feature vector.


The act of using data to find the best predictive model (e.g., by optimizing the parameters of a parametric model) is called model training.

Loss function

How do we assess the quality of a predictive model?

How do we compare predicted outputs $widetilde{y}_t$ with observed outputs $y_t$?

We do these things by specifying a loss function, which is always required in a machine learning problem.

A loss function quantifies the losses that we incur when we make inaccurate predictions.



Ideally, the best predictive models is the one having the smallest statistical risk (or expected loss) [eq10]where the expected value is with respect to the joint distribution of $x_{t}$ and $y_{t}$.

Empirical risk

Since the true joint distribution of $x_{t}$ and $y_{t}$ is usually unknown, the risk is approximated by the empirical risk [eq11]where $U$ is a set of input-output pairs used for calculating the empirical risk and [eq12] is its cardinality (the number of input-output pairs contained in $U$).

Thus, the empirical risk is the sample average of the losses over a set of observed data $U$. This is the reason why we sometimes call it average loss.

How to choose $U$ is one of the most important decisions in machine learning and we will discuss it at length.

For specific choices of the loss function, empirical risk has names that are well-known to statisticians:

Empirical risk minimization

The criterion generally followed in machine learning is that of empirical risk minimization:

Statistically speaking, it is a sound criterion because empirical risk minimizers are extremum estimators.

How to cite

Please cite as:

Taboga, Marco (2021). "Predictive model", Lectures on machine learning.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.