Logistic regression - Maximum Likelihood Estimation

This lecture deals with maximum likelihood estimation of the logistic classification model (also called logit model or logistic regression).

Before proceeding, you might want to revise the introductions to maximum likelihood estimation (MLE) and to the logit model.

Table of contents

Model and notation
The likelihood
The log-likelihood
The score
The Hessian
Absence of analytical solutions
Perfect separation of classes
The first-order condition
Newton-Raphson method
Iteratively reweighted least squares
Covariance matrix of the estimator
Other examples

Model and notation

In the logit model, the output variable $y_{i}$ is a Bernoulli random variable (it can take only two values, either 1 or 0) andwhere [eq2] is the logistic function, $x_{i}$ is a vector of inputs and is a vector of coefficients.

Furthermore,

The vector of coefficients is the parameter to be estimated by maximum likelihood.

We assume that the estimation is carried out with an IID sample comprising data points

The likelihood

The likelihood of an observation can be written as

If you are wondering about the exponents $y_{i}$ and $1-y_{i}$ or, more in general, about this formula for the likelihood, you are advised to revise the lecture on Classification models and their maximum likelihood estimation.

Denote the vector of all outputs by and the matrix of all inputs by .

Since the observations are IID, then the likelihood of the entire sample is equal to the product of the likelihoods of the single observations: [eq7]

The log-likelihood

The log-likelihood of the logistic model is [eq8]

Proof

It is computed as follows:

[eq9]

The score

The score vector, that is the vector of first derivatives of the log-likelihood with respect to the parameter , is [eq10]

Proof

This is obtained as follows: [eq11]

The Hessian

The Hessian, that is the matrix of second derivatives, is [eq12]

Proof

It can be proved as follows: [eq13] where we have used the fact that the derivative of the logistic function is [eq14]

Absence of analytical solutions

The maximum likelihood estimator of the parameter solves

In general, there is no analytical solution of this maximization problem and a solution must be found numerically (see the lecture entitled Maximum likelihood algorithm for an introduction to the numerical maximization of the likelihood).

Perfect separation of classes

The maximization problem is not guaranteed to have a solution because some pathological situations can arise in which the log-likelihood is an unbounded function of the parameters.

In these situations the log-likelihood can be made as large as desired by appropriately choosing . This happens when the residuals can be made as small as desired (so-called perfect separation of classes).

It is not a common situation. It means that the model can perfectly fit the observed classes.

The first-order condition

In all other situations, the maximization problem has a solution, and at the maximum the score vector satisfies the first order condition that is, [eq17]

Note that is the error committed by using as a predictor of $y_{i}$ . It is similar to a regression residual (see Linear regression).

Furthermore, the first order condition above is similar to the first order condition that is found when estimating a linear regression model by ordinary least squares: it says that the residuals need to be orthogonal to the predictors $x_{i}$ .

Newton-Raphson method

The first order condition above has no explicit solution. In most statistical software packages it is solved by using the Newton-Raphson method. The method is pretty simple: we start from a guess of the solution (e.g., ), and then we recursively update the guess with the equationuntil numerical convergence (of to the solution ).

Denote by $widehat{y}_{t}$ the vector of conditional probabilities of the outputs computed by using as parameter: [eq25]

Denote by $W_{t}$ the diagonal matrix (i.e., having all off-diagonal elements equal to ) such that the elements on its diagonal are , ..., : [eq28]

The matrix of inputs [eq29] which is called design matrix (as in linear regression), is assumed to be a full-rank matrix.

By using this notation, the score in Newton-Raphson recursive formula can be written asand the Hessian as

Therefore, the Newton-Raphson formula becomeswhere the existence of the inverse is guaranteed by the assumption that has full-rank (the assumption also guarantees that the log-likelihood is concave and the maximum likelihood problem has a unique solution).

Iteratively reweighted least squares

If you deal with logit models, you will often read that they can be estimated by Iteratively Reweighted Least Squares (IRLS). The Newton-Raphson formula above is equivalent to the IRLS formula that is obtained by performing a Weighted Least Squares (WLS) estimation with weights $W_{t-1}$ of a linear regression of the dependent variables on the regressors .

Proof

Write as Then, we can re-write the Newton Raphson formula as follows: [eq38]

The IRLS formula can alternatively be written as

Covariance matrix of the estimator

The asymptotic covariance matrix of the maximum likelihood estimator is usually estimated with the Hessian (see the lecture on the covariance matrix of MLE estimators), as follows: [eq40] where and $W=W_{T}$ ( is the last step of the iterative procedure used to maximize the likelihood). As a consequence, the distribution of can be approximated by a normal distribution with mean equal to the true parameter value and variance equal to

Other examples

StatLect has several MLE examples. Learn how to find the estimators of the parameters of the following models and distributions.

	Type	Solution
Exponential distribution	Univariate distribution	Analytical
Normal distribution	Univariate distribution	Analytical
Poisson distribution	Univariate distribution	Analytical
T distribution	Univariate distribution	Numerical
Multivariate normal distribution	Multivariate distribution	Analytical
Normal linear regression model	Regression model	Analytical
Probit classification model	Classification model	Numerical

How to cite

Please cite as:

Taboga, Marco (2021). "Logistic regression - Maximum Likelihood Estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/logistic-model-maximum-likelihood.