Multivariate statistics employs vectors of statistics (mean, variance, etc.), which can be considered an extension of the descriptive statistics described in univariate Descriptive Statistics.
(also written more simply as X = [xj]) and then define the sample mean (vector) of X to be
and similarly for the sample variance, standard deviation and other statistics. Also if the μj are the population means of the xj then the population mean (vector) of X is defined to be
and similarly for population variance, standard deviation, etc. We can also define row vectors versions of these.
Example 1: Figure 1 shows the following statistics for each of the EU countries: gross national product (GDP) per capita (measured in the purchasing power parity with thousands of US dollars), accumulated public debt (as a percentage of GDP), current annual public deficit (as a percentage of GDP), current annual inflation rate and percentage of the population that is unemployed. Find the sample mean vector.
Figure 1 – Data for Example 1
The sample mean row vector (range B32:F32) is [29.8, 61.2, -6.3, 2.1, 9.6], and similarly for variance and standard deviation. We can also look at column vector versions of these statistics. E.g. the sample variance column vector is
Definition 2: Given a k × 1 column vector of random variables X = [xj] and samples of size n for each variable xj of the form xij, …, xnj. We can define the k × k sample variance-covariance matrix (or simply the sample covariance matrix) S as [sij] where sij = cov(xi, xj). Since cov(xj, xj) = var(xj) = and cov(xj, xj) = cov(xj, xi), the covariance matrix is symmetric with the main diagonal consisting of the sample variances.
Similarly, we can define the population variance-covariance matrix (or simply the population covariance matrix) Σ as above where the covariances are population covariances.
The sample and population correlation matrices can be defined as [rij] where
Observation: By Property 0 of Least Squares in Multiple Regression, the sample covariance matrix can be expressed by the matrix equation
where X̄ is the 1 × k row vector of sample means. Also the correlation matrix can be expressed as
where D = the 1 × k row vector of sample standard deviations.
Example 2: Calculate the sample covariance and correlation matrices for the data in Example 1.
Figure 2 – Sample covariance and correlation matrices for Example 2
Referring to both Figure 1 and 2, the sample covariance matrix is constructed by highlighting range H5:L9 (or any other 5 x 5 range) and entering the supplemental array formula =COV(B4:F30) or optionally the standard Excel formula
The correlation matrix is constructed by highlighting the range N5:R9 and entering the formula
Property 1: If λ1, …, λk are eigenvalues of S then
Proof: By Property 1 of Eigenvalues and Eigenvectors, the trace of S equals the sum of the eigenvalues of S, but as we observed earlier, the elements on the diagonal of S are the variances, and so the sum of these variances is also equal to the trace of S.