StatLect
Index > Fundamentals of statistics > Maximum likelihood > Algorithm

Maximum likelihood - MATLAB Example

In the lecture entitled Maximum likelihood - Algorithm we have explained how to compute the maximum likelihood estimator of a parameter by numerical methods. In this lecture we provide a fully worked out example that illustrates how to do so with MATLAB.

In what follows, I will assume that you have access to a MATLAB installation comprising both the Statistics and the Optimization toolboxes (an installation of Octave - a free software that is very similar to MATLAB - should provide the same functionality).

Table of Contents

Data

We have a sample of 100 independent draws from a standard Student's t distribution with n degrees of freedom. The parameter n is unknown and we want to estimate it by maximum likelihood.

The data (the 100 observations) are stored in the MATLAB file data.mat, which you need to download.

Parametrization

Note that the parameter n must be strictly positive, that is, it must belong to the interval [eq1]. Therefore, the optimization problem we need to solve in order to estimate n is a constrained optimization problem. As explained in the lecture Maximum likelihood - Algorithm, it is preferable to avoid constrained problems when possible. In this case, it is possible because n can be easily reparametrized as[eq2]where $	heta $ is our new parameter and there are no constraints on it, because it can take any value in the interval [eq3].

Coding the log-likelihood function

The likelihood function is coded as a routine that takes as inputs a value for the parameter and the data, and returns as output the value of the log-likelihood with its sign changed. The code is as follows.

function val=log_lik(theta,data)
n=exp(theta);
val=-sum(log(tpdf(data,n)));

The name of the function is log_lik. It takes as arguments the parameter theta and the vector of observations data.

The function tpdf (which is part of the Statistics toolbox) computes the probability density function of a Standard Student's t distribution. In particular, tpdf(data,n) returns a vector of densities (one density for each observation in the vector data), under the hypothesis that the number of degrees of freedom is equal to n. By taking the natural logarithm with the log function and summing over all entries of the vector, we obtain the log-likelihood of the sample. In other words, with the command sum(log(tpdf(data,df))) we compute the log-likelihood[eq4]where $x_{j}$ is an observation (a component of the vector data), $T $ is the sample size (the dimension of the vector data) and [eq5] is the probability density function of $x_{j}$, given that the parameter is equal to $	heta $.

Finally, we change the sign of the log-likelihood, by putting a minus in front of it, because the optimization routine we are going to use performs minimization by default, and we instead want to maximize the log-likelihood function. The value thus obtained is stored in the variable val, which is returned by the function.

The optimization algorithm

The code presented in this subsection runs the optimization algorithm, in order to find numerically the maximum likelihood estimator of the parameter.

load data
options=optimset('Display','off','MaxIter',10000,'TolX',10^-30,'TolFun',10^-30);
theta0=1;
[theta,fval,exitflag,output,grad,hessian]=fminunc('log_lik',theta0,options,data);
exp(theta)

We first load the data file with the command load data.

We then set some options of the optimization algorithm. The option Display is set to off, which means that the optimization algorithm will run silently, without showing the output of each iteration. The option MaxIter is set to 10000, which means that the algorithm will perform a maximum of 10,000 iterations. The option TolX is set to 10^-30, which means that the termination tolerance on the parameter will be $10^{-30}$. Also the option TolFun is set to 10^-30, which means that the termination tolerance on the value of the function to be optimized will be $10^{-30}$.

The starting value for the parameter (our initial guess) is set equal to 1 with the command theta0=1.

Finally, the MATLAB derivative-based optimization algorithm fminunc is called. It takes as arguments:

The function returns as outputs:

The last line of code (exp(theta)) displays the estimate of the parameter $n,$, which is equal to[eq6]where $widehat{	heta }$ is the maximum likelihood estimate of $	heta $ (stored in the variable theta). If the code is correctly executed, the displayed value should be 3.9389, which is our point estimate of the parameter of interest. This corresponds to an estimate of 1.3709 for $	heta $.

You can also display the value of the gradient (by typing grad). A number that is numerically close to zero should be displayed, which means that the first order condition for an optimum (i.e., first derivative equals zero) is approximately satisfied.

Multiple starts

We do not know whether the properties of the fminunc algorithm, together with the properties of our likelihood function, guarantee numerical convergence to the true solution of the problem. As we have previously explained, when there is no theoretical guarantee that numerical convergence can be achieved, a multiple starts approach is usually followed: the numerical optimization algorithm is run several times, with random starting values for the parameter, and if all runs of the algorithm (or a majority of them) lead to the same proposed solution, then this is taken as evidence that the proposed solution is a good approximation of the true solution.

The following code is a multiple starts variant of the code explained above.

load data
stream=RandStream('mt19937ar','Seed',1);
RandStream.setDefaultStream(stream);
options=optimset('Display','off','MaxIter',10000,'TolX',10^-30,'TolFun',10^-30);
for j=1:10
theta0=1+10*randn(1,1);
[theta(j),fval(j),exitflag(j),output(j),grad(j),hessian(j)]=...
fminunc('log_lik',theta0,options,data);
end
theta

The two commands involving RandStream initialize the random number generator. You should not worry about these two commands. They are there just to ensure that, if you run this code on your computer, you will get exactly the same results I get.

The optimization is repeated ten times, and each time the starting value theta0 is generated randomly. In particular, we add to our initial guess (theta0=1) a random number extracted from a standard normal distribution (randn(1,1)), multiplied by 10.

The outputs are stored in vectors, whose entries are indexed by the variable j. In particular, the ten proposed solutions are stored in the vector theta, and the ten values of the log-likelihood function corresponding to these solutions are stored in the vector val.

If, after running the script, you print the proposed solutions by typing theta, you will see that they are all equal to the one we had found previously (1.3709), except for one - the second - which is equal to 12.8117. Thus, we have encountered some trouble. This confirms our suspicion that it was not a wise idea to be satisfied with only one run of the algorithm.

By printing the vectors val and grad, we can easily see that 12.8117 cannot be a solution, because it corresponds to a value of the objective function that is higher than the value corresponding to 1.3709, and because the gradient is not equal to zero.

If we run the algorithm many more times (e.g., increase the number of repetitions to 1000 in the for cycle), we will see that also other solutions are proposed, but they are not really solutions, because of the same reasons (i.e., non-zero derivative, higher value of the objective function).

Thus, we can take the result of the multiple starts algorithm as evidence that 1.3709 is a sound solution. At the same time, this reminds us of the fact that it is really a bad idea to run an optimization algorithm only once if we are not sure of its convergence properties.

Estimating the variance

To produce an estimate of the variance of $widehat{	heta }$, we can use one of the methods introduced in the lecture Maximum likelihood - Covariance matrix estimation. In particular, since fminunc provides a numerical estimate of the Hessian matrix, we can use a method based on this estimate.

Remember that the distribution of the maximum likelihood estimator $widehat{	heta }$ can be approximated by a multivariate normal distribution with mean equal to the true parameter $	heta _{0}$ and covariance matrix equal to [eq7]where[eq8]is an estimate of the asymptotic covariance matrix and [eq9] denotes the matrix of second derivatives. In other words, the estimate of the variance of $widehat{	heta }$ is[eq10]

In our case there is only one parameter, so the matrix is actually a scalar and we can write[eq11]The value stored by fminunc in the variable hessian is the second derivative of log_lik, which is (see above)[eq12]By linearity, this is the same as[eq13]Therefore, all we need to do in order to obtain an estimate of the variance of $widehat{	heta }$ is to compute the reciprocal of hessian, with the command hessian^-1 (if you do this after running the multiple starts algorithm, take only one entry of the vector hessian, corresponding to the estimate 1.3709). If everything was done correctly, the estimate should be equal to 0.0970. This can be translated into an estimate of the variance of $widehat{n}$ with the Delta method, by multiplying the estimated variance of $widehat{	heta }$ by [eq14]. As a result, we obtain that the estimated variance of $widehat{n}$ is 1.5046.

MATLAB files

All the MATLAB codes presented in this lecture are stored in a zipped file, which you can download.

The book

Most of the learning materials found on this website are now available in a traditional textbook format.