A covariance formula is an equation used to define or calculate the covariance between two variables.
There are several formulae that can be used, depending on the situation.
Table of contents
We begin with a general formula, used to define the covariance between two random variables and :where:
denotes the covariance;
denotes the expected value operator.
This is a definition and it is useful because of its generality. However, you need to use the equations below if you need to compute covariance in practice.
When the two random variables are discrete, the above formula can be written aswhere:
is the set of all couples of values of and that can possibly be observed;
is the joint probability mass function, which gives the probability of observing a specific couple ;
the summation symbol indicates that we need to perform a sum over all the values that and can take jointly.
In other words, we sum the products of the deviations of the two random variables from their respective means. Each product is weighted by a probability.
Suppose that the probability mass function is
The support contains three possible couples:
The calculations are performed as follows:
When the two random variables are continuous, the covariance formula involves a double integral:where:
is the joint probability density function of and ;
both the integrals are between and .
The double integral is computed in two steps:
we calculate the inner integral:which will be found to be a function of only because is "integrated out";
we compute the outer integral
Let the joint probability density function be
In order to compute the expected values, we first need to find the marginal density functions:
We can now work out the covariance:
Instead of using the formulae above to find the covariance, it is often easier to use the following equivalent equation based on moments and cross moments:
In the previous example, after finding the expected values of and , we could have done:
When we know the joint moment generating function of and , we can use it to compute the moments , and and then plug their values in the formula above.
Until now, we have discussed how to calculate the covariance between two random variables.
However, there is another concept, that of sample covariance, which is used to measure the degree of association between two observed variables in a sample of data.
Given observed couples their sample covariance is calculated aswhere and are the sample means of the two variables:
An alternative to the formula above is the so-called unbiased sample covariance
The only difference is that we divide by instead of dividing by .
If the observed couples are independent draws from the joint distribution of two random variables and , then is an unbiased estimator of .
In this example, there are four observed couples, whose values are reported in the columns of the table below.
The last two rows of the table are used to calculate the means and the sample covariance (biased and unbiased).
Observation number | x_{j} | Deviation of x_{j} from mean | y_{j} | Deviation of y_{j} from mean | Product of deviations |
---|---|---|---|---|---|
1 | 1 | -1 | 5 | 2 | -2 |
2 | 3 | 1 | 0 | -3 | -3 |
3 | 0 | -2 | -1 | -4 | 8 |
4 | 4 | 2 | 8 | 5 | 10 |
Sum | 8 | 0 | 12 | 0 | 13 |
Divide sum by n | 2 | 3 | 13/4 | ||
Divide sum by n-1 | 13/3 |
More details about these formulae - including proofs and solved exercises - can be found in the lecture on Covariance.
Previous entry: Countable additivity
Next entry: Covariance stationary
Please cite as:
Taboga, Marco (2021). "Covariance formula", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/covariance-formula.
Most of the learning materials found on this website are now available in a traditional textbook format.