Search for probability and statistics terms on Statlect

Logistic classification model (logit or logistic regression)

by , PhD

The logistic model (or logit) is a classification model used to predict variables that can take only two values.

Table of Contents


The logistic classification model has the following characteristics:

Interpretation of the predicted output

In a logit model, the predicted output $widehat{y}_{i}$ has two interpretations:

Classification vs regression

A logit model is often called logistic regression model.

However, we prefer to stick to the convention (widespread in the machine learning community) of using the term regression only for models in which the output variable is continuous.

Therefore, we use the term classification here because in a logit model the output is discrete.


Suppose that we observe a sample of data [eq4] for $i=1,ldots ,N$.

Each observation has:

Conditional probabilities

The output $y_{i}$ can take only two values, either 0 or 1 (it is a Bernoulli random variable).

The probability that the output $y_{i}$ is equal to 1, conditional on the inputs $x_{i}$, is assumed to be[eq5]where [eq6]is the logistic function and $eta $ is a Kx1 vector of coefficients.

The probability that $y_{i}$ is equal to 0 is[eq7]

The logistic function

It is immediate to see that the logistic function $Sleft( t
ight) $ is always positive.

Furthermore, it is increasing and [eq8]so that it satisfies[eq9]

Thus, [eq10] is a well-defined probability because it lies between 0 and 1.


Why is the logistic classification model specified in this manner?

Why is the logistic function used to transform the linear combination of inputs $x_{i}eta $?

The simple answer is that we would like to do something similar to what we do in a linear regression model: use a linear combination of the inputs as our prediction of the output.

However, our prediction needs to be a probability and there is no guarantee that the linear combination $x_{i}eta $ is between 0 and 1.

Thus, we use the logistic function because it provides a convenient way of transforming $x_{i}eta $ and forcing it to lie in the interval between 0 and 1.


We could have used other functions that enjoy properties similar to the logistic function.

As a matter of fact, other popular classification models can be obtained by simply substituting the logistic function with another function and leaving everything else in the model unchanged.

For example, by substituting the logit function with the cumulative distribution function of a standard normal distribution, we obtain the so-called probit model.

The logit model as a latent variable model

Another way of thinking about the logit model is to define a latent variable (i.e., an unobserved variable)[eq11]where $arepsilon _{i}$ is a random error term that adds noise to the relationship between the inputs $x_{i}$ and the variable $z_{i}$.

The latent variable $z_{i}$ is then assumed to determine the output $y_{i}$ as follows:[eq12]

From these assumptions and the additional assumption that $arepsilon _{i}$ has a symmetric distribution around 0, it follows that[eq13]where $Fleft( {}
ight) $ is the cumulative distribution function of the error $arepsilon _{i}$.

It turns out that the logistic function used to define the logit model is the cumulative distribution function of a symmetric probability distribution called standard logistic distribution.

Therefore, the logit model can be written as a latent variable model, specified by equations (1) and (2) above, in which the error $arepsilon _{i}$ has a logistic distribution.

By choosing different distributions for the error $arepsilon _{i}$, we obtain other binary classification models.

For example, if we assume that $arepsilon _{i}$ has a standard normal distribution, then we obtain the probit model.

Estimation by maximum likelihood

The vector of coefficients $eta $ is often estimated by maximum likelihood methods.

Assume that the observations [eq14] in the sample are IID and denote the $N	imes 1$ vector of all outputs by $y$ and the $N	imes K$ matrix of all inputs by X. The latter is assumed to have full rank.

It is possible to prove (see the lecture on Maximum likelihood estimation of the logit model) that the maximum likelihood estimator $widehat{eta }$ (when it exists) can be obtained by performing simple Newton-Raphson iterations as follows:

The asymptotic covariance matrix of the maximum likelihood estimator $widehat{eta }$ can be consistently estimated by [eq23]so that the distribution of the estimator $widehat{eta }$ is approximately normal with mean equal to $eta $ and covariance matrix [eq24].

Hypothesis testing

If the logit model is estimated with the maximum likelihood procedure illustrated above, any one of the classical tests based on maximum likelihood procedures (e.g., Wald, Likelihood Ratio, Lagrange Multiplier) can be used to test an hypothesis about the vector of coefficients $eta $.

Other tests can be constructed by exploiting the asymptotic normality of the maximum likelihood estimator.

For example, we can perform a z test to test the null hypothesis [eq25]where $eta _{k}$ is the k-th entry of the vector of coefficients $eta $ and $qin U{211d} $.

The test statistic is[eq26]where [eq27] is the k-th entry of $widehat{eta }$ and [eq28] is the k-th entry on the diagonal of the matrix [eq29].

As the sample size $N$ increases, $z$ converges in distribution to a standard normal distribution. The latter distribution can be used to derive critical values and perform the test.


We have[eq30]By the asymptotic normality of the maximum likelihood estimator, the numerator [eq31] converges in distribution to a normal random variable with mean 0. Furthermore, the consistency of our estimator of the asymptotic covariance matrix implies that[eq32]where [eq33] denotes convergence in probability. By the Continuous Mapping theorem, [eq34]and, by Slutsky's theorem, $z$ converges in distribution to a standard normal random variable.

How to cite

Please cite as:

Taboga, Marco (2021). "Logistic classification model (logit or logistic regression)", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.