The logistic model (or logit) is a classification model used to predict variables that can take only two values.
The logistic classification model has the following characteristics:
the output variable
can be equal to either 0 or 1;
the predicted output
is a number between 0 and 1;
as in linear regression, we use a vector of estimated coefficients
to compute
,
a linear combination of the input variables
;
unlike in linear regression, we transform
using a nonlinear function
,
to make sure that the predictions
are
between 0 and 1.
In a logit model, the predicted output
has two interpretations:
the estimated probability that
will be equal to 1;
our best guess of the value of the output variable
.
A logit model is often called logistic regression model.
However, we prefer to stick to the convention (widespread in the machine learning community) of using the term regression only for models in which the output variable is continuous.
Therefore, we use the term classification here because in a logit model the output is discrete.
Suppose that we observe a sample of data
for
.
Each observation has:
an output variable denoted by
;
a
vector of inputs, denoted by
.
The output
can take only two values, either 0 or 1 (it is a
Bernoulli random
variable).
The probability that the output
is equal to 1,
conditional
on the inputs
,
is assumed to
be
where
is
the logistic function and
is a
vector of coefficients.
The probability that
is equal to 0
is
It is immediate to see that the logistic function
is always positive.
Furthermore, it is increasing and
so
that it
satisfies
Thus,
is a well-defined probability because it lies between 0 and 1.
Why is the logistic classification model specified in this manner?
Why is the logistic function used to transform the linear combination of
inputs
?
The simple answer is that we would like to do something similar to what we do in a linear regression model: use a linear combination of the inputs as our prediction of the output.
However, our prediction needs to be a probability and there is no guarantee
that the linear combination
is between 0 and 1.
Thus, we use the logistic function because it provides a convenient way of
transforming
and forcing it to lie in the interval between 0 and 1.
We could have used other functions that enjoy properties similar to the logistic function.
As a matter of fact, other popular classification models can be obtained by simply substituting the logistic function with another function and leaving everything else in the model unchanged.
For example, by substituting the logit function with the cumulative distribution function of a standard normal distribution, we obtain the so-called probit model.
Another way of thinking about the logit model is to define a latent variable
(i.e., an unobserved
variable)where
is a random error term that adds noise to the relationship between the inputs
and the variable
.
The latent variable
is then assumed to determine the output
as
follows:
From these assumptions and the additional assumption that
has a symmetric distribution around
,
it follows
that
where
is the cumulative distribution
function of the error
.
It turns out that the logistic function used to define the logit model is the cumulative distribution function of a symmetric probability distribution called standard logistic distribution.
Therefore, the logit model can be written as a latent variable model,
specified by equations (1) and (2) above, in which the error
has a logistic distribution.
By choosing different distributions for the error
,
we obtain other binary classification models.
For example, if we assume that
has a standard normal distribution, then we obtain the probit model.
The vector of coefficients
is often estimated by
maximum
likelihood methods.
Assume that the observations
in the sample are IID and denote the
vector of all outputs by
and the
matrix of all inputs by
.
The latter is assumed to have full
rank.
It is possible to prove (see the lecture on
Maximum
likelihood estimation of the logit model) that the maximum likelihood
estimator
(when it exists) can be obtained by performing simple
Newton-Raphson
iterations as follows:
start from a guess
(e.g.,
);
recursively update the
guess:where:
and
is an
diagonal matrix (i.e., having all off-diagonal entries equal to
)
such that the elements on its diagonal are
stop when numerical convergence is achieved, that is, when the difference
between
and
is so small as to be negligible;
set the maximum likelihood estimator
equal to the last update
(denote the last iteration by
).
The asymptotic covariance matrix of the maximum likelihood estimator
can be consistently estimated by
so
that the distribution of the estimator
is approximately normal with mean equal to
and covariance
matrix
.
If the logit model is estimated with the maximum likelihood procedure
illustrated above, any one of the classical
tests
based on maximum likelihood procedures (e.g.,
Wald,
Likelihood
Ratio, Lagrange
Multiplier) can be used to
test an
hypothesis about the vector of coefficients
.
Other tests can be constructed by exploiting the asymptotic normality of the maximum likelihood estimator.
For example, we can perform a z test to test the
null hypothesis
where
is the
-th
entry of the vector of coefficients
and
.
The test statistic
iswhere
is the
-th
entry of
and
is the
-th
entry on the diagonal of the matrix
.
As the sample size
increases,
converges in distribution to a
standard normal
distribution. The latter distribution can be used to
derive critical values and perform the
test.
We
haveBy
the asymptotic normality of the maximum likelihood estimator, the numerator
converges in
distribution to a normal random variable with mean
.
Furthermore, the consistency of our estimator of the asymptotic covariance
matrix implies
that
where
denotes convergence
in probability. By the
Continuous Mapping
theorem,
and,
by Slutsky's theorem,
converges in distribution to a standard normal random variable.
Please cite as:
Taboga, Marco (2021). "Logistic classification model (logit or logistic regression)", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/logistic-classification-model.
Most of the learning materials found on this website are now available in a traditional textbook format.