This lecture introduces conditional probability models, a class of statistical models in which sample data are divided into input and output data and the relation between the two kind of data is studied by modelling the conditional probability distribution of the outputs given the inputs. This is in contrast to unconditional models (sometimes also called generative models) where the data is studied by modelling the joint distribution of inputs and outputs.
Before introducing conditional models, let us review the main elements of a statistical model (see the lecture entitled Statistical inference):
there is a sample , which can be regarded as a realization of a random vector (for example, could be a vector collecting the realizations of some independent random variables);
the joint distribution function of the sample, denoted by , is not known exactly;
the sample is used to infer some characteristics of ;
a model for is used to make inferences, where a model is simply a set of joint distribution functions to which is assumed to belong.
In a conditional model, the sample is partitioned into inputs and outputs:where denotes the vector of outputs and the vector of inputs. The object of interest is the conditional distribution function of the outputs given the inputsand specifying a conditional model means specifying a set of conditional distribution functions to which is assumed to belong.
In other words, in a conditional model, the problem of model specification is simplified by narrowing the focus of the statistician's attention on the conditional distribution of the outputs and by ignoring the distribution of the inputs. This can be seen, for example, in the case in which both inputs and outputs are continuous random variables. In such a case, specifying an unconditional model is equivalent to specifying a joint probability density function for the inputs and the outputs. But a joint density can be seen as the product of a marginal and a conditional density:So, in an unconditional model we explicitly or implicitly specify both the marginal probability density function and the conditional probability density function . On the other hand, in a conditional model, we specify only the conditional and we leave the marginal unspecified.
This section presents some of the terminology that is often used when dealing with conditional models.
The following distinction is often made, especially in the field of machine learning:
if the output is a continuous random variable, then a conditional model is called a regression model;
if the output is a discrete random variable, taking finitely many values (typically few), then a conditional model is called a classification model.
The input variables are often called:
regressors (in the context of regression models)
The output variables are often called:
regressands (in the context of regression models)
The following subsections introduce some examples of conditional models.
The linear regression model is probably the oldest, best understood and most widely used conditional model. In the linear regression model, the response variables are assumed to be a linear function of the inputs :where is any observation from the sample, is a scalar output, is a vector of inputs, is a vector of constants (called regression coefficients) and is an unobservable random variable that adds noise to the linear relationship between inputs and outputs. A linear regression model is speficied by making assumptions about the error term . For example, is often assumed to have a normal distribution with zero mean and to be independent of . In such a case, we have that, conditional on the inputs , the output has a normal distribution with mean . As a consequence, the conditional density of is where is the variance of . The parameters and are usually unknown and need to be estimated. So, we have a different conditional distribution for each of the values of and that are deemed plausible by the statistician before observing the sample. The set of all these conditional distributions (associated to the different parameters) constitutes the conditional model for .
In the logistic classification model, the response variable is a Bernoulli random variable: it can take only two values, either or . It is assumed that the conditional probability mass function of is a non-linear function of the inputs :where is a vector of inputs, is a vector of constants and is the logistic function defined by
Most learning materials found on this website are now available in a traditional textbook format.