The geometric distribution is the probability distribution of the number of failures we get by repeating a Bernoulli experiment until we obtain the first success.
Consider a Bernoulli experiment, that is, a random experiment having two possible outcomes: either success or failure.
We repeat the experiment until we get the first success, and then we count the number of failures that we faced prior to recording the success.
Since the experiments are random, is a random variable. If the repetitions of the experiment are independent of each other, then the distribution of , which we are going to study below, is called geometric distribution.
Example If we toss a coin until we obtain head, the number of tails before the first head has a geometric distribution.
At the end of this lecture we will also study a slight variant of the geometric distribution, called shifted geometric distribution. The latter is the distribution of the total number of trials (all the failures + the first success). In other words, if has a geometric distribution, then has a shifted geometric distribution.
The geometric distribution is characterized as follows.
Definition Let be a discrete random variable. Let . Let the support of be the set of non-negative integersWe say that has a geometric distribution with parameter if its probability mass function is
The following is a proof that is a legitimate probability mass function.
The probabilities are well-defined and non-negative for any because . We just need to prove that the sum of over its support equals : where in step we have used the formula for geometric series.
As we have said in the introduction, the geometric distribution is related to the Bernoulli distribution.
Remember that a Bernoulli random variable is equal to (success) with probability and to (failure) with probability .
Proposition Let be a sequence of independent Bernoulli random variables with parameter . Then, for any integer , the probability that for and iswhere is the probability mass function of a geometric distribution with parameter .
Since the Bernoulli random variables are independent, we have that
The expected value of a geometric random variable is
It can be derived as follows:
The variance of a geometric random variable is
Let us first derive the second moment of :Now, we can use the variance formula:
The moment generating function of a geometric random variable is defined for any :
This is proved as follows:where the series in step converges only if that is, only ifBy taking the natural log of both sides, the condition becomes
The characteristic function of a geometric random variable is
The proof is similar to the proof for the mgf:
The distribution function of a geometric random variable is
For , , because cannot be smaller than . For , we have
As we have said in the introduction, the geometric distribution is the distribution of the number of failed trials before the first success, while the shifted geometric distribution is the distribution of the total number of trials (all the failures + the first success). In other words, if has a geometric distribution, then has a shifted geometric distribution.
It is then simple to derive the properties of the shifted geometric distribution.
It expected value is
Its variance is
Its moment generating function is, for any :
Its characteristic function is
Its distribution function is
The geometric distribution is considered a discrete version of the exponential distribution.
Suppose the Bernoulli experiments are performed at equal time intervals. Then, the geometric random variable is the time, measured in discrete units, that elapses before we obtain the first success. But if we want to model the time elapsed before a given event occurs in continuous time, then the appropriate distribution to use is the exponential distribution (see the introduction to this lecture).
From a mathematical viewpoint, the geometric distribution enjoys the same memoryless property possessed by the exponential distribution:
in the exponential case, the probability that the event happens during a given time interval is independent of how much time has already passed without the event happening;
in the geometric case, the probability that the event happens at a given point in (discrete) time is not dependent on what happened before because the Bernoulli experiment performed at each point in time is independent of previous trials.
Below you can find some exercises with explained solutions.
On each day we play a lottery in which the probability of winning is . What is the expected value of the number of days that will elapse before we win for the first time?
Each time we play the lottery, the outcome is a Bernoulli random variable (equal to 1 if we win), with parameter . Therefore, the number of days before winning is a geometric random variable with parameter . Its expected value is
Most of the learning materials found on this website are now available in a traditional textbook format.