Point estimation is a type of statistical inference which consists in producing a guess or approximation of an unknown parameter.
In this lecture we introduce the theoretical framework that underlies all point estimation problems.
At the end of the lecture, we provide links to detailed examples of point estimation, in which we show how to apply the theory.
The main elements of a point estimation problem are those found in any statistical inference problem:
we have a sample that has been drawn from a probability distribution whose characteristics are at least partly unknown;
the sample is regarded as the realization of a random vector ;
the joint distribution function of , denoted by , is assumed to belong to a set of distribution functions , called statistical model.
When the model is put into correspondence with a set of real vectors, then we have a parametric model.
The set is called the parameter space and its elements are called parameters.
Denote by the parameter that is associated with the data-generating distribution and assume that is unique. The vector is called the true parameter.
Point estimation is the act of choosing a vector that approximates . The approximation is called an estimate (or point estimate) of .
When the estimate is produced using a predefined rule (a function) that associates a parameter estimate to each in the support of , we can write
The function is called an estimator.
Often, the symbol is used to denote both the estimate and the estimator. The meaning is usually clear from the context.
According to the decision-theoretic terminology introduced previously, making an estimate is an act, which produces consequences.
Among these consequences, the most relevant one is the estimation error
The statistician's goal is to commit the smallest possible estimation error.
The preference for small errors can be formalized with a loss function that quantifies the loss incurred by estimating with .
Examples of loss functions are:
the absolute error:where is the Euclidean norm (it coincides with the absolute value when );
the squared error:
When the estimate is obtained from an estimator, it is a function of the random vector and the loss is a random variable.
The expected value of the lossis called the statistical risk (or, simply, the risk) of the estimator .
The expected value in the definition of risk is computed with respect to the true distribution function .
Therefore, we can compute the risk only if we know the true parameter and .
When and are unknown, the risk needs to be estimated.
For example, we can approximate the risk with the quantity where:
we pretend that the estimate is the true parameter;
we denote the estimator of by
we compute the expected value with respect to the estimated distribution function .
Even if the risk is unknown, the notion of risk is often used to derive theoretical properties of estimators.
Point estimation is always guided, at least ideally, by the principle of risk minimization, that is, by the search for estimators that minimize the risk.
Depending on the specific loss function we use, the statistical risk of an estimator can take different names:
when the absolute error is used as a loss function, then the riskis called the Mean Absolute Error (MAE) of the estimator.
when the squared error is used as a loss function, then the riskis called Mean Squared Error (MSE). The square root of the mean squared error is called root mean squared error (RMSE).
In this section we discuss other criteria that are commonly used to evaluate estimators.
If an estimator produces parameter estimates that are on average correct, then it is said to be unbiased.
The following is a formal definition.
Definition Let be the true parameter. An estimator is an unbiased estimator of if and only ifIf an estimator is not unbiased, then it is called a biased estimator.
If an estimator is unbiased, then the estimation error is on average zero:
If an estimator produces parameter estimates that converge to the true value when the sample size increases, then it is said to be consistent.
The following is a formal definition.
Definition Let be a sequence of samples such that all the distribution functions are put into correspondence with the same parameter . A sequence of estimators is said to be consistent (or weakly consistent) if and only ifwhere indicates convergence in probability. The sequence of estimators is said to be strongly consistent if and only ifwhere indicates almost sure convergence. A sequence of estimators which is not consistent is called inconsistent.
When the sequence of estimators is obtained using the same predefined rule for every sample , we often say, with a slight abuse of language, "consistent estimator" instead of saying "consistent sequence of estimators". In such cases, what we mean is that the predefined rule produces a consistent sequence of estimators.
You can find detailed examples of point estimation in the lectures on:
The methods to find point estimators are called estimation methods.
You can read about these methods here:
There is another kind of estimation, called set estimation or interval estimation.
While in point estimation we produce a single estimate meant to approximate the true parameter, in set estimation we produce a whole set of estimates meant to include the true parameter with high probability.
Please cite as:
Taboga, Marco (2021). "Point estimation", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/point-estimation.
Most of the learning materials found on this website are now available in a traditional textbook format.