StatlectThe Digital Textbook
Index > Fundamentals of statistics

Conditional models

This lecture introduces conditional probability models, a class of statistical models in which sample data are divided into input and output data and the relation between the two kind of data is studied by modelling the conditional probability distribution of the outputs given the inputs. This is in contrast to unconditional models (sometimes also called generative models) where the data is studied by modelling the joint distribution of inputs and outputs.


Before introducing conditional models, let us review the main elements of a statistical model (see the lecture entitled Statistical inference):

  1. there is a sample $xi $, which can be regarded as a realization of a random vector $Xi $ (for example, $xi $ could be a vector collecting the realizations of some independent random variables);

  2. the joint distribution function of the sample, denoted by [eq1], is not known exactly;

  3. the sample $xi $ is used to infer some characteristics of [eq2];

  4. a model for $Xi $ is used to make inferences, where a model is simply a set of joint distribution functions to which [eq3] is assumed to belong.

In a conditional model, the sample $xi $ is partitioned into inputs and outputs:[eq4]where $y$ denotes the vector of outputs and x the vector of inputs. The object of interest is the conditional distribution function of the outputs given the inputs[eq5]and specifying a conditional model means specifying a set of conditional distribution functions to which [eq6] is assumed to belong.

In other words, in a conditional model, the problem of model specification is simplified by narrowing the focus of the statistician's attention on the conditional distribution of the outputs and by ignoring the distribution of the inputs. This can be seen, for example, in the case in which both inputs and outputs are continuous random variables. In such a case, specifying an unconditional model is equivalent to specifying a joint probability density function [eq7]for the inputs and the outputs. But a joint density can be seen as the product of a marginal and a conditional density:[eq8]So, in an unconditional model we explicitly or implicitly specify both the marginal probability density function [eq9] and the conditional probability density function [eq10]. On the other hand, in a conditional model, we specify only the conditional [eq11] and we leave the marginal [eq12] unspecified.


This section presents some of the terminology that is often used when dealing with conditional models.

Regression and classification

The following distinction is often made, especially in the field of machine learning:

  1. if the output is a continuous random variable, then a conditional model is called a regression model;

  2. if the output is a discrete random variable, taking finitely many values (typically few), then a conditional model is called a classification model.


The input variables are often called:


The output variables are often called:


The following subsections introduce some examples of conditional models.

Linear regression model

The linear regression model is probably the oldest, best understood and most widely used conditional model. In the linear regression model, the response variables $y$ are assumed to be a linear function of the inputs x:[eq13]where [eq14] is any observation from the sample, $y_{i} $ is a scalar output, $x_{i}$ is a $1	imes K$ vector of inputs, $eta $ is a Kx1 vector of constants (called regression coefficients) and $arepsilon _{i}$ is an unobservable random variable that adds noise to the linear relationship between inputs and outputs. A linear regression model is speficied by making assumptions about the error term $arepsilon _{i}$. For example, $arepsilon _{i}$ is often assumed to have a normal distribution with zero mean and to be independent of $x_{i}$. In such a case, we have that, conditional on the inputs $x_{i}$, the output $y_{i}$ has a normal distribution with mean $x_{i}eta $. As a consequence, the conditional density of $y_{i}$ is [eq15]where sigma^2 is the variance of $arepsilon _{i}$. The parameters $eta $ and sigma^2 are usually unknown and need to be estimated. So, we have a different conditional distribution for each of the values of $eta $ and sigma^2 that are deemed plausible by the statistician before observing the sample. The set of all these conditional distributions (associated to the different parameters) constitutes the conditional model for [eq14].

Logistic classification model

In the logistic classification model, the response variable $y_{i}$ is a Bernoulli random variable: it can take only two values, either 1 or 0. It is assumed that the conditional probability mass function of $y_{i}$ is a non-linear function of the inputs $x_{i}$:[eq17]where $x_{i}$ is a $1	imes K$ vector of inputs, $eta $ is a Kx1 vector of constants and [eq18] is the logistic function defined by[eq19]

The book

Most learning materials found on this website are now available in a traditional textbook format.