# Real Statistics Multivariate Functions

The following is a summary of all the multivariate functions provided in the Real Statistics Resource Pack.

### Multivariate Normal Distribution

BNORMSDIST(x, y, r, cum) = the cdf of the standard bivariate normal distribution at x and y with correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

BNORMDIST(x1, x2, m1, m2, s1, s2, r, cum) = the cdf of the standard bivariate normal distribution at at x1 and x2 with means m1 and m2, standard deviations s1 and s2 and correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

MNORMRAND(R1, R2): generates an array containing a random k × 1 vector from a k-dimensional multivariate normal distribution where R1 is the range that contains the k × 1 means vector and R2 is the range that contains the k × k covariance matrix

BNORMRAND(m1, m2, s1, s2, r): generates a random 2 × 1 vector from a bivariate normal distribution with means m1, m2, standard deviations s1, s2 and correlation coefficient r.

MSKEWTEST(R1, lab): Mardia’s skewness test for multivariate normality; returns a column range with the values skewness, chi-square statistic, df and p-value, plus corrected statistic and p-value for small samples

MKURTTEST(R1, lab): Mardia’s kurtosis test for multivariate normality; returns a column range with the values kurtosis, z-statistic and p-value

If lab = TRUE then an extra column of labels is appended to the results (defaults to FALSE).

### Hotelling T2

T2TEST(R1, R2, type) = the p-value for Hotelling’s T2 test
HotellingT2 (R1, R2, type) = T2 statistic
HotellingF (R1, R2, type) = F statistic
Hotellingdf (R1, R2, type) = df2 parameter

R1 and R2 are the ranges of the two samples, except in the one sample case where R1 is the sample and R2 is a column vector representing the population mean column vector being tested.

The following are the acceptable values for type:

0 = one sample
1 = the samples have paired values from the same population
2 = the samples are from populations with the same covariance matrices
3 = the samples are from populations with different covariance matrices

Note that the df1 parameter is always equal to the number of columns in R1.

There are also the following array functions:

Hotelling(R1, R2, type, lab) – outputs a 5 × 1 column range with the values T-square, df1, df2, F and p-value for Hotelling T2 test for the data in ranges R1 and R2.

T2_RepeatedMeasures(R1, lab) – outputs a 5 × 1 column range with the values T-square, df1, df2, F and p-value for Hotelling T2 test for repeated measures design based on the data in ranges R1 (w/o headings).

If lab = TRUE then the output is a 5 × 2 range where the first column contains labels.

### MANOVA

Where R1 is a data range in standard form (without column headings), we have the following functions:

Wilk’s Lambda Test:

MANOVA_WilksLambda(R1) = Λ
MANOVA_Wilksdf1(R1) = df1
MANOVA_Wilksdf2(R1) = df2
MANOVA_WilksF(R1) = F
MANOVA_WilksTest(R1) = p-value

Hotelling-Lawley Trace Test:

MANOVA_HotelTrace(R1) = $T_0^2$
MANOVA_Hoteldf1(R1) = df1
MANOVA_Hoteldf2(R1) = df2
MANOVA_HotelF(R1) = F
MANOVA_HotelTest(R1) = p-value

Pillai-Bartlett Trace Test:

MANOVA_PillaiTrace(R1) = V
MANOVA_Pillaidf1(R1) = df1
MANOVA_Pillaidf2(R1) = df2
MANOVA_PillaiF(R1) = F
MANOVA_PillaiTest(R1) = p-value

Roy’s Largest Root:

MANOVA_RoyRoot(R1, b) = largest eigenvalue λp of HE-1 if b = TRUE (default) and = $\frac{\lambda_p}{1+\lambda_p}$  if b = FALSE

The following are array functions which output a 5 × 1 array consisting of the specified Manova statistic, df1, df2, F statistic and p-value.

MANOVA_Wilks(R1)
MANOVA_Hotel(R1)
MANOVA_Pillai(R1)

The following are array functions which calculate one of the SSCP matrices:

MANOVA_T(R1) = T
MANOVA_H(R1) = H
MANOVA_E(R1) = E

### Box’s Test

In the following, R1 is a range in standard formal if k = 0 (or is omitted) and R1 is in covariance matrix format if k > 0 where k = the number of dependent variables (i.e. all the covariance matrices are k × k).

BOXTEST(R1, k, b) = p-value
BOXM(R1, k) = M statistic
BOXF(R1, k) = F statistic
BOXdf1(R1, k) = df1
BOXdf2(R1, k) = df2
BOX(R1, k) – see below

There is also the following array function

BOX(R1, k): outputs a 5 × 1 range consisting of the entries BOXM(R1, k), BOXdf1(R1, k), BOXdf2(R1, k), BOXF(R1, k) and BOXTEST(R1, k) in that order.

When b is TRUE (or omitted) BOXTEST(R1, k, b) performs the F test, while when it is FALSE the chi-square test is performed instead.

Where R1 and k are as described above, the following is an array function

COVPooled(R1, k) = k × k pooled covariance matrix

### Extraction

The following are array functions where range R1 contains data in standard form (with or without column headings) and s is a string which presumably specifies a group (in the first column of R1):

ExtractRows(R1, s, b) = the array which contains all the elements in range R1 for the group labeled s. If b is set to TRUE, the first row of R1 is included in the output (presumably column headings) even if there is no match. If b is omitted it defaults to TRUE.

ExtractCov(R1, s) = the covariance matrix for the group labeled s, based on the data in range R1. If s is the empty string “” or is omitted then the pooled covariance matrix of R1 is returned.

For the following array function range R1 contains data in standard form (with column headings) and s is a string which specifies a column heading (i.e. an entry in the first row of R1) that selects one of the dependent variables.

ExtractCol(R1, s) = an array which consists of all the data in R1 for the dependent variable identified by s, now organized by columns with one column for each group. You should highlight a range with g columns where g = the number of groups.

### Factor Analysis

For a k × k range R1, the following is an array function:

VARIMAX(R1): Produces a k × k array containing the loading factor matrix after applying a Varimax rotation to the loading factor matrix contained in range R1.

For a k × k range R1 representing a correlation matrix, the following is an array function:

ExtractCommunalities(R1, iter, prec, eigen) = the 1 × k row vector with the communalities after convergence based on a precision value of prec (default = .00001), maximum number of iter iteration (default = 25) and maximum number of iterations eigen in finding eigenvalues/vectors (default = 100).

### Cluster Analysis

The following are are array functions:

CLUST(R1, k, R2) = m × 1 column range of cluster numbers 1, 2, …, k calculated by the k-means algorithm corresponding to the m n-tuple data elements in the m × nrange R1 where R2 is an m × 1 column range containing the initial cluster number assignments. If R2 is omitted then the k-means++ algorithm is used to calculate the initial cluster number assignments.

ClustAnal(R1, k, iter) = the m × 1 column range of cluster numbers produced by CLUST(R1, k) with the lowest SSE after iter iterations (here each of the iterations uses the k-means++ algorithm). If iter = 0, however, then the m × 1 column range of initial cluster numbers based on the k-means++ algorithm is returned.

CLUSTERS(R1, R2) = m × 1 column range of cluster numbers 1, 2, …, k corresponding to the centroid described in the n × k range R2 which is closest to the respective data element in the m × n range R1.

CENTROIDS(R1, R2) = n × k range defining the k centroids corresponding to the  m× 1 column range R2 of clusters for the respective data elements in the m × n range R1. Here k = the maximum value in R2.

CENTROIDS(R1, k) = n × k range defining the k initial centroids corresponding to the  data elements in the m × n range R1 based on the k-means++ algorithm

The following non-array functions:

SSE_CLUSTER(R1, R2, k) = SSE value for the data in the m × n range R1 based on the cluster assignment in the m × 1 column range which contains cluster number assignments based on k clusters 1, 2, …, k.

SSE_CENTROID(R1, R2, k) = SSE value for the data in the m × n range R1 based on the centroids described in the n × k range R2.

### Jenks Natural Breaks

JENKS(R1,, lab, iter) – performs Jenks Natural Breaks optimization on the data in range R1 for k classes and outputs a k+1 × 3 range, whose first k rows contain the left and right endpoints of each of the classes followed by the number of elements in the class. The last row contains the total squared deviation of the k classes, followed by the squared deviation for the data in R1 and GVF.

Here k+1 = the number of rows in the highlighted output range. If iter is included then the algorithm uses iter iterations, otherwise all possible partitions are tested.

JENKS(R1, k, lab, iter) – outputs a a k × 1 column range, whose first k−1 rows contain the right endpoint of the first k−1 classes (the endpoint of the kth category is the largest element in R1) and whose last element is the GVF.

GVF(R1, R2) = GVF for the right endpoints (breaks) in range R2 based on the data in column range R1.

If lab = TRUE (default = FALSE) then an extra column of labels is included.

### Discriminant Analysis

LDACoeff(Rt, head): returns an array with the LDA coefficients for the (training) data in Rt consisting of one row for each independent variable whose columns consist of the name of the independent variable, the intercept coefficient and a coefficient for each dependent variable. If head = TRUE (default), then the data in Rt contains column headings (corresponding to the names of the dependent variables); these headings are also appended to the output from this function.

LDAPredC(R0, Rc, Rp, lab): returns an array whose rows contain the probabilities for each vector in the data array R0 (which contains no row/column headings) using the LDA coefficient array in array Rc (without column headings) and prior probabilities in the column array Rp. A column of names of the independent variable that has the highest probability is also appended to the output.

LDAPred(R0, Rt, Rp, lab) = LDAPredC(R0, LDACoeff(Rt,FALSE), Rp, lab), i.e. the predictions for the vectors in R0 based on the LDA model defined by Rt and Rp.

QDAPred(R0, Rt, Rp, lab): returns an array whose rows contain the probabilities for each vector in the data array R0 (which contains no row/column headings) and the name of the independent variable with the highest probability as for the LDAPred function, except that the QDA model is used instead of the LDA model.

DAClassification(Rt, Rp, linear): returns a classification for the training data in Rt and priors in Rp. If linear = TRUE (default), then the classification table is based on an LDA model, while if linear = FALSE then a QDA model is used instead.

DASummary(R1): returns a summary of the classification table in range R1

If Rp is omitted then equally probably priors are used. If lab = TRUE (default FALSE) then column headings are added to the output.

### Correspondence Analysis

In the following R1 is an array containing an m x n contingency table, R2 is an mm × n array containing supplementary row profiles and R3 is an m × nn array containing supplementary column profiles. None of these arrays include row or column headings or totals. We also assume that k = min(m, n) – 1.

CARowFactors(R1): returns an m × k array in which the ith row contains the ith row factor vector for R1

CAColFactors(R1): returns an n × k array in which the ith row contains the ithcolumn factor vector for R1

CARowFactors(R1, R2): returns an mm × n array in which the ith row contains the row factor vector for the ith supplementary row profile in R2 based on the data in R1.

CAColFactors(R1, R3): returns an m × nn array in which the ith row contains the factor vector for the ith supplementary column profile in R3 based on the data in R1.

CAEigen(R1): returns an k × 1 column array with the CA eigenvalues for R1

### Linear Algebra

For a k × k range R1, the following are array functions:

eVALUES(R1, iter, order): Produces a 2 × k array whose first row contains the eigenvalues λ of matrix A in range R1. The second row consists of the values det(A–λI).

eVECTORS(R1, iter, order): returns an n+3 × n range, where n = the number of rows/columns in the square range R1. The first row of the output consists of the real eigenvalues of the square matrix A corresponding to the data in R1. Below each eigenvalue λ in the first row is a unit n × 1 eigenvector corresponding to λ. In the second-to-last row of the output are the values det(A−λI). In the last row of the output, below each eigenvalue λ and eigenvector X is the value max {bi: i = 1 to n} where B = AX− λX.

eVECT(R1, iter, order): returns an n+3 × n range of the same format as eVECTORS. Whereas eVECTORS should only be used with symmetric matrices, eVECT can also be used with non-symmetric matrices.

SCHUR(R1, iter, order): returns matrices Q and T such that A = QTQT is a Schur’s factorization of A

SCHURQ(R1, iter, order): returns only matrix Q of the Schur’s factorization of A

SPECTRAL(R1, iter): returns a 2n × n range whose top half is the matrix U and whose lower half is the matrix D in the spectral decomposition of A where A is the matrix of values in range R1.

SVD_U(R1, iter) = U matrix of the singular vector decomposition (SVD) for the matrix A corresponding to range R1, Here A = UDVT where and V are orthogonal matrices and D is a diagonal matrix.

SVD_D(R1, iter) = D matrix of the SVD for the matrix A corresponding to range R1

SVD_V(R1, iter) = V matrix of the SVD for the matrix A corresponding to range R1

The argument iter specifies the number of iterations used. If omitted it defaults to 100 iterations. If order is TRUE or omitted then the eigenvalues are listed in order from highest in absolute value to smallest. If order is FALSE then they are listed in order from highest to lowest.

MSQRT(R1): Produces a k × k array which is the square root of the matrix represented by range R1

For an m × n range R1

QRFactorR(R1, prec): Produces the n × n array R for which A = QR where A is the matrix in R1. Note that to obtain Q you use the fact that AR-1; i.e. Q is obtained using the formula =MMULT(R1, MINVERSE(R2)) where range R2 contains the formula QRFactorR(R1). The function treats numbers smaller than prec as if they were zero. If prec is omitted it defaults to 0.

QRFactor(R1, prec): Produces an m+n × n array. The first m rows of the output is Q and the next n rows of the output is R where A = QR and A is the matrix in range R1. The function treats numbers smaller than prec as if they were zero. If prec is omitted it defaults to 0.

MRANK(R1, prec) = the rank of the matrix specified in range R1. The function treats numbers smaller than prec as if they were zero. If prec is omitted it defaults to .00001.

QRSolve(R1, R2) – assuming R1 is an m × n range describing matrix A and R2 is an m × 1 range describing the column vector C, QRSolve outputs an n × 1 column vector X containing the solution to AX = C

For an k × k range R1

CHOL(R1) = the lower triangular k × k matrix L such that LLT is the Cholesky decomposition of the matrix in R1.

QRInverse(R1) = inverse of the matrix described by range R1 using QR Factorization

Mahalanobis Distance

MDistSq(R1, R2) = the Mahalanobis distance squared between the 1 × k row vector R2 and the mean vector of the sample contained in the n × k range R1

QuadForm(R1, R2, R3) = the Mahalanobis distance squared between the 1 × k row vector R1 and the 1 × mean vector based on the 1 × population covariance matrix in R3

### 17 Responses to Real Statistics Multivariate Functions

1. nisak says:

thanks

• Charles says:

Hi Nisak,
The Real Statistics website includes a tutorial on multivariate statistical analysis. Please let me know which topics are missing or what additional information do you need.
Charles

2. Joel Dubow says:

Hi Charles;
Please forgive my newbie questions. I want to access the Real Statistics Multivariate Functions to analyze some data I have. Can I access all the functions using the Control M plus follow the menu, or do I need to go to the example workbooks for any of them?

Also, if I copy one of the multivariate functions example workbooks and then paste it into a blank workbook will it still be active and link to the Resource Pack?

Thanks.

• Charles says:

Joel,

Once you have downloaded and installed the Real Statistics Resource pack you can use any of the multivariate functions and data analysis tools (even if you have never downloaded the examples workbook). Simply press Ctrl-m (small m) to access any of the data analysis tools. Functions are used just like any Excel functions.

The Real Statistics functions and tools are accessible on any Excel workbook, including the Multivariate Examples Workbook. For this workbook you will need to change the link to the resource pack as explained on the Download Examples Workbooks webpage.

Charles

3. Odko says:

Dear Charles,
Thanks for nice tool. I found that what I need. But while installing I did not find realstats in Add-ins. Please help me.
Thanks,
Odko

• Charles says:

Odko,
Once you have downloaded the Real Statistics Resource Pack file, you need to press Alt-TI and then press the Browse button to locate where you have stored the Real Statistics Resource Pack file. Once this is done the file should then appear on the list of add-ins.
Charles

4. Derek Robinson says:

Hi there
I have downloaded the resource pack and it appears on the toolbar bur I cannot find the multivariate tools anywhere. My spreadsheet doesn’t recognise them.
Can you tell me what I am doing wrong please?
Derek

• Charles says:

Derek,
What do you see when you enter the formula =VER()
When you press Ctrl-m, do you see Survival Analysis on the list of data analysis tools?
Charles

• Derek Robinson says:

Hi there
I get 3.5.3 but I do not have a list of data analysis tools anywhere.
I am using an iMac with excel 2011.
I have the StatsPlus installed.
I cannot apparently instal the analysis tool pack.
I cannot see Survival Analysis on Real Statistics data analysis tools.
I am lost!
Derek

• Charles says:

Derek,
At present Rel 3.5.3 is indeed the latest release for the Mac. This release doesn’t include Survival Analysis. You would need the Windows version of the Real Statistics software to use Survival Analysis. To get a list of data analysis tools just press Cmd-m to bring up the main dialog box.
Charles

• Derek Robinson says:

Hi Charles
Does that mean I can’t use the multivariate such as BNORMRAND to generate data?
Derek

• Charles says:

Derek,
No, you can use BNORMRAND to generate a random sample which conforms to a binormal distribution.
Charles

5. Charles,

Is there a MANCOVA in the analysis resource pack…… ?

I am not seeing it……

• Charles says:

Eric,
Sorry, but I don’t support MANCOVA yet.
Charles

6. Charles,

Thanks for letting me know on the MANCOVA …. I hope it is a tool that is on a future update. Your work is very helpful. I enjoy using your resource pack and look forward to your future work……

All The Best,

Eric

7. Dear Charles,
Wonderful website and data analysis tool! Thank you so very much!
I have one question: I am doing factor analysis and source contributions, for which I have used other software tools in the past (SPSS, StatistiXL, XLSTAT) and with those tools an essential matrix (the Casewise Factor Scores) is produced with the Factor Analysis tool. However, Real Statistics Resource Tool only produces what I’ve always understood to be the Factor Loadings, but labels it Factor Scores matrices (three different algorithms).
How do I obtain the Scores matrix , i.e. the factor associated with each object?

Also, There does not appear to be a tool for Principal Components Analysis, which is an important exploratory tool for me. Is it hidden in another module somewhere?

Many thanks,
Marietjie