Index > Fundamentals of statistics

Logistic classification model (logit or logistic regression)

The logistic classification model (or logit model) is a binary classification model in which the conditional probability of one of the two possible realizations of the output variable is assumed to be equal to a linear combination of the input variables, transformed by the logistic function.

Table of Contents

Classification vs regression

A logit model is often called logistic regression model. However, in these lecture notes we prefer to stick to the convention (widespread in the machine learning community) of using the term regression only for conditional models in which the output variable is continuous. So we use the term classification here because in a logit model the output is discrete.

Model specification

Suppose that we observe a sample of data [eq1] for $i=1,ldots ,N$. Each observation in the sample is made up of:

It is assumed that the output $y_{i}$ can take only two values, either 1 or 0 (it is a Bernoulli random variable).

The probability that the output $y_{i}$ is equal to 1, conditional on the inputs $x_{i}$, is assumed to be[eq2]where [eq3]is the logistic function and $eta $ is a Kx1 vector of coefficients.

It is immediate to see that the logistic function $Sleft( t
ight) $ is always positive. Furthermore, it is increasing and [eq4]so that it satisfies[eq5]

Thus, [eq6] is a well-defined probability because it lies between 0 and 1.

Since probabilities need to sum up to 1, the probability that the output $y_{i}$ is equal to 0 (the only other possible realization of $y_{i}$) is[eq7]


Why is the logistic classification model specified in this manner? Why is the logistic function used to transform the linear combination of inputs $x_{i}eta $?

The simple answer is that we would like to do something similar to what we do in a linear regression model: use a linear combination of the inputs as our prediction of the output. However, our prediction needs to be a probability and there is no guarantee that the linear combination $x_{i}eta $ is between 0 and 1. Thus, we use the logistic function because it provides a convenient way of transforming $x_{i}eta $ and forcing it to lie in the interval between 0 and 1.

We could have used other functions that enjoy properties similar to the logistic function. As a matter of fact, other popular classification models can be obtained by simply substituting the logistic function with another function and leaving everything else in the model unchanged. For example, by substituting the logit function with the cumulative distribution function of a standard normal distribution, we obtain the so-called probit model.

The logit model as a latent variable model

Another way of thinking about the logit model is to define a latent variable (i.e., an unobserved variable)[eq8]where $arepsilon _{i}$ is a random error term that adds noise to the relationship between the inputs $x_{i}$ and the variable $z_{i}$. The latent variable $z_{i}$ is then assumed to determine the output $y_{i}$ as follows:[eq9]From these assumptions and the additional assumption that $arepsilon _{i}$ has a symmetric distribution around 0 it follows that[eq10]where $Fleft( {}
ight) $ is the cumulative distribution function of the error $arepsilon _{i}$.

It turns out that the logistic function used to define the logit model is the cumulative distribution function of a symmetric probability distribution called standard logistic distribution. Therefore, the logit model can be written as a latent variable model, specified by equations (1) and (2) above, in which the error $arepsilon _{i}$ has a logistic distribution.

By choosing different distributions for the error $arepsilon _{i}$, we obtain other binary classification models. For example, if we assume that $arepsilon _{i}$ has a standard normal distribution, then we obtain the so-called probit model.

Estimation by maximum likelihood

The vector of coefficients $eta $ is often estimated by maximum likelihood methods.

Assume that the observations [eq1] in the sample are IID and denote the $N	imes 1$ vector of all outputs by $y$ and the $N	imes K$ matrix of all inputs by X. The latter is assumed to have full rank.

It is possible to prove (see the lecture on Maximum likelihood estimation of the logit model) that the maximum likelihood estimator $widehat{eta }$ (when it exists) can be obtained by performing simple Newton-Raphson iterations as follows:

The asymptotic covariance matrix of the maximum likelihood estimator $widehat{eta }$ can be consistently estimated by [eq20]so that the distribution of the estimator $widehat{eta }$ is approximately normal with mean equal to $eta $ and covariance matrix [eq21].

Hypothesis testing

If the logit model is estimated with the maximum likelihood procedure illustrated above, any one of the classical tests based on maximum likelihood procedures (e.g., Wald, Likelihood Ratio, Lagrange Multiplier) can be used to test an hypothesis about the vector of coefficients $eta $.

Other tests can be constructed by exploiting the asymptotic normality of the maximum likelihood estimator. For example, we can perform a z test to test the null hypothesis [eq22]where $eta _{k}$ is the k-th entry of the vector of coefficients $eta $ and $qin U{211d} $.

The test statistic is[eq23]where [eq24] is the k-th entry of $widehat{eta }$ and [eq25] is the k-th entry on the diagonal of the matrix [eq26].

As the sample size $N$ increases, $z$ converges in distribution to a standard normal distribution. The latter distribution can be used to derive critical values and perform the test.


We have[eq27]By the asymptotic normality of the maximum likelihood estimator, the numerator [eq28] converges in distribution to a normal random variable with mean 0. Furthermore, the consistency of our estimator of the asymptotic covariance matrix implies that[eq29]where [eq30] denotes convergence in probability. By the Continuous Mapping theorem, [eq31]and, by Slutsky's theorem, $z$ converges in distribution to a standard normal random variable.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.