A design matrix is a matrix containing data about multiple characteristics of several individuals or objects. Each row corresponds to an individual and each column to a characteristic.
The design matrix is a fundamental mathematical object in regression analysis,
for example, in
linear regression
models and in
logit
models. It is often denoted by the capital letter
.
We provide here some examples of design matrices.
Example
If we measure the height and weight of five individuals, we can collect the
measurements in a design matrix having five rows and two columns. Each row
corresponds to one of the ten individuals, the first column contains the
height measurements and the second one reports the
weights:where
denotes the height of the
-th
individual and
her weight.
Example
If we collect the data about the gross domestic product (GDP) of four
countries in three consecutive years, then the design matrix is the
matrix
where,
for example,
is the GDP of the third country in the second year.
Consider the linear
regressionwhere
is the dependent variable,
is a
vector containing the
explanatory variables (regressors),
is a
vector of regression coefficients,
is the error term and there are
observations
(
).
Thus, we observe
characteristics, contained in the vector of regressors
,
for each of the
observations.
All the observations can be collected in the design
matrixwhere
denotes the
-th
entry of the vector
,
that is, the
-th
regressor.
We can similarly stack the observations of the dependent variable and the
error terms into two
vectors:
Having defined the design matrix
and the two vectors
and
,
we can write the regression equations in matrix
form:
This allows us to use matrix algebra to find
an estimator of the regression coefficients
(see the lecture on
linear regression to see how).
In most statistical models the design matrix is required to have full-rank, that is, its columns must be linearly independent (see, e.g., the normal linear regression model). When this requirement is not met, we say that the design matrix suffers from multicollinearity (see this lecture for details).
However, there are also regression models where the design matrix can be rank-deficient (i.e., not full-rank), for example the Ridge regression model.
See the lecture on linear regression models for more details.
Previous entry: Cross-covariance matrix
Next entry: Discrete random variable
Please cite as:
Taboga, Marco (2021). "Design matrix", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/glossary/design-matrix.
Most of the learning materials found on this website are now available in a traditional textbook format.