# Descriptive Multivariate Statistics

Multivariate statistics employs vectors of statistics (mean, variance, etc.), which can be considered an extension of the descriptive statistics described in univariate Descriptive Statistics.

Definition 1: Given k random variables x1, …, xk and a sample of size n for each variable xj of the form xij, …, xnj. We can define the k × 1 column vector X (also known as a random vector) as

(also written more simply as X = [xj]) and then define the sample mean (vector) of X to be

and similarly for the sample variance, standard deviation and other statistics. Also if the μj are the population means of the xj then the population mean (vector) of X is defined to be

and similarly for population variance, standard deviation, etc. We can also define row vectors versions of these.

Example 1: Figure 1 shows the following statistics for each of the EU countries: gross national product (GDP) per capita (measured in the purchasing power parity with thousands of US dollars), accumulated public debt (as a percentage of GDP), current annual public deficit (as a percentage of GDP), current annual inflation rate and percentage of the population that is unemployed. Find the sample mean vector.

Figure 1 – Data for Example 1

The sample mean row vector (range B32:F32) is [29.8, 61.2, -6.3, 2.1, 9.6], and similarly for variance and standard deviation. We can also look at column vector versions of these statistics. E.g. the sample variance column vector is

Definition 2: Given a k × 1 column vector of random variables X = [xj] and samples of size n for each variable xj of the form xij, …, xnj. We can define the k × k sample variance-covariance matrix (or simply the sample covariance matrix) S as [sij] where sij = cov(xi, xj). Since cov(xj, xj) = var(xj) = $s_j^2$ and cov(xj, xj) = cov(xj, xi), the covariance matrix is symmetric with the main diagonal consisting of the sample variances.

Similarly, we can define the population variance-covariance matrix (or simply the population covariance matrix) Σ as above where the covariances are population covariances.

The sample and population correlation matrices can be defined as [rij] where

Since
it follows that the main diagonal of this matrix consists only of 1’s.

Observation: By Property 0 of Least Squares in Multiple Regression, the sample covariance matrix can be expressed by the matrix equation

where is the 1 × k row vector of sample means. Also the correlation matrix can be expressed as

where D = the 1 × k row vector of sample standard deviations.

Example 2: Calculate the sample covariance and correlation matrices for the data in Example 1.

Figure 2 – Sample covariance and correlation matrices for Example 2

Figure 2 – Sample covariance and correlation matrices for Example 2

Referring to both Figure 1 and 2, the sample covariance matrix is constructed by highlighting range H5:L9 (or any other 5 x 5 range) and entering the supplemental array formula =COV(B4:F30) or optionally the standard Excel formula

=MMULT(TRANSPOSE(B4:F30-B32:F32),B4:F30-B32:F32)/(COUNTA(A4:A30)1)

The correlation matrix is constructed by highlighting the range N5:R9 and entering the formula

=COV(B4:F30)/MMULT(TRANSPOSE(B34:F34),B34:F34)

Property 1: If λ1, …, λk are eigenvalues of S then

Proof: By Property 1 of Eigenvalues and Eigenvectors, the trace of S equals the sum of the eigenvalues of S, but as we observed earlier, the elements on the diagonal of S are the variances, and so the sum of these variances is also equal to the trace of S.

### 15 Responses to Descriptive Multivariate Statistics

1. Philip says:

Hi,
im looking to calculate the mean vector of X and X roof. I read it may have something to do with anova

• Charles says:

Philip,
Is X roof the same thing as X hat (predicted value) or X bar (mean)?
Charles

2. Dr. Buenas tardes, ¿podría pensar en la aplicación de análisis de correspondencias, en el paquete de estadísticas reales?

Dr. Good evening, could you think in implementing Correspondence Analyisis, in real statistics pack?

• Charles says:

Gerardo,
It is on my list of future enhancements.
Charles

• Thank you very much

3. Silvana says:

Hello,
I am trying to compare villages that are inside and outside protected areas. I have 30 variables (ordinal) from fieldwork equally collected for 236 villages ( 75 inside and 162 outside ).
Is there any test to compare these 2 groups, considering 30 variables??
Thank you very much
Sil

4. Abida Awan says:

Hi ,
I am looking for statistical method .Which will find significance or combined score using multiple attributes.Attributes are numerical in nature.

• Charles says:

Sorry, but I don’t understand your questions.
Charles

5. Manfred Becker says:

Hi
I’m looking an appropriate test-statistic to compare tests results of 4 groups (Years) of an intelligence-test with overall results and results of factors (verbal/math/nonverbal). Every year particpicants with different sex and country took part.
Thanks for help

• Charles says:

Manfred,
It does sound like a multivariate test, but I can’t tell which is the appropriate test statistic from the information that you have provided.
Charles

6. ram says:

Hi

I need help to analysiz likert scale 1-7 for customer survey

• Charles says:

What sort of help do you need?

I calculated the COVARIANCE matrix using the formula suggested by you. The values are different from yours (using COV). also, if use EXCEL’s COVAR to populate each cell of the Covraince matrix, the values are different.

Can you help me understand why these differences?

• Charles says:

Excel’s COVAR function calculates the population covariance of two sets, while COV calculates a sample covariance matrix. You need to use COVP to calculate the population covariance matrix or COVARIANCE.S (in Excel 2010/2013) to calculate a sample covariance. You can also use COVAR(R1,R2)*n/(n-1) to calculate the sample covariance where n = COUNT(R1) = COUNT(R2).

The formula suggested by me only works properly if there is no missing data in any of the cells. If there is some missing data =COV(R1) will ignore any row which contains missing data. If this isn’t what you want you may prefer to use the formula =COV(R1,FALSE). See the webpage http://www.real-statistics.com/multiple-regression/least-squares-method-multiple-regression/ for more information about this.

If none of this helps, then if you like please send me an Excel worksheet with an example of where the calculations don’t come out correctly. I will look at it and try to figure what the problem is.

Charles