The following is a summary of all the multivariate functions provided in the Real Statistics Resource Pack.

**Multivariate Normal Distribution**

**BNORMSDIST**(*x, *y*, r, cum*) = the cdf of the standard bivariate normal distribution at *x* and y with correlation coefficient *r* if cum = TRUE and the pdf if *cum *= FALSE

**BNORMDIST**(*x*1, *x*2, *m*1, *m*2, *s*1,* s*2, *r, cum*) = the cdf of the standard bivariate normal distribution at at *x*1 and *x*2 with means *m*1 and *m*2, standard deviations *s*1 and *s*2 and correlation coefficient* r* if *cum *= TRUE and the pdf if *cum *= FALSE

**MNORMRAND**(R1, R2): generates an array containing a random *k* × 1 vector from a *k*-dimensional multivariate normal distribution where R1 is the range that contains the *k* × 1 means vector and R2 is the range that contains the* k × k* covariance matrix

**BNORMRAND**(*m*1, *m*2, *s*1, *s*2, *r*): generates a random 2 × 1 vector from a bivariate normal distribution with means *m*1, *m*2, standard deviations *s*1, *s*2 and correlation coefficient *r*.

**MSKEWTEST**(R1, *lab*): Mardia’s skewness test for multivariate normality; returns a column range with the values skewness, chi-square statistic, *df* and p-value, plus corrected statistic and p-value for small samples

**MKURTTEST**(R1, *lab*): Mardia’s kurtosis test for multivariate normality; returns a column range with the values kurtosis, z-statistic and p-value

If *lab *= TRUE then an extra column of labels is appended to the results (defaults to FALSE).

**Hotelling ***T*^{2}

*T*

^{2}

**T2TEST**(R1, R2, *type*) = the p-value for Hotelling’s *T*^{2} test**HotellingT2** (R1, R2, *type*) = *T*^{2} statistic**HotellingF** (R1, R2, *type*) = *F* statistic**Hotellingdf** (R1, R2, *type*) = *df*_{2} parameter

R1 and R2 are the ranges of the two samples, except in the one sample case where R1 is the sample and R2 is a column vector representing the population mean column vector being tested.

The following are the acceptable values for *type*:

0 = one sample

1 = the samples have paired values from the same population

2 = the samples are from populations with the same covariance matrices

3 = the samples are from populations with different covariance matrices

Note that the *df*_{1} parameter is always equal to the number of columns in R1.

There are also the following array functions:

**Hotelling**(R1, R2,* type, lab*) – outputs a 5 × 1 column range with the values *T-square*, *df*1, *df*2, *F* and p-value for Hotelling T2 test for the data in ranges R1 and R2.

**T2_RepeatedMeasures**(R1, *lab*) – outputs a 5 × 1 column range with the values *T-square*, *df*1, *df*2, *F* and p-value for Hotelling T2 test for repeated measures design based on the data in ranges R1 (w/o headings).

If *lab* = TRUE then the output is a 5 × 2 range where the first column contains labels.

**MANOVA**

Where R1 is a data range in standard form (without column headings), we have the following functions:

**Wilk’s Lambda Test**:

**MANOVA_WilksLambda**(R1) = *Λ***MANOVA_Wilksdf1**(R1) = *df _{1}*

**MANOVA_Wilksdf2**(R1) =

*df*

_{2}**MANOVA_WilksF**(R1) =

*F*

**MANOVA_WilksTest**(R1) = p-value

**Hotelling-Lawley Trace Test**:

**MANOVA_HotelTrace**(R1) = **MANOVA_Hoteldf1**(R1) = *df*_{1} **MANOVA_Hoteldf2**(R1) = *df _{2}*

**MANOVA_HotelF**(R1) =

*F*

**MANOVA_HotelTest**(R1) = p-value

**Pillai-Bartlett Trace Test**:

**MANOVA_PillaiTrace**(R1) = *V***MANOVA_Pillaidf1**(R1) = *df*_{1} **MANOVA_Pillaidf2**(R1) = *df _{2}*

**MANOVA_PillaiF**(R1) =

*F*

**MANOVA_PillaiTest**(R1) = p-value

**Roy’s Largest Root**:

**MANOVA_RoyRoot**(R1, *b*) = largest eigenvalue *λ _{p}* of

*HE*

^{-1}if

*b*= TRUE (default) and = if

*b*= FALSE

The following are array functions which output a 5 × 1 array consisting of the specified Manova statistic, *df*_{1}, *df*_{2}, *F* statistic and p-value.

**MANOVA_Wilks**(R1)

**MANOVA_Hotel**(R1)

**MANOVA_Pillai**(R1)

The following are array functions which calculate one of the SSCP matrices:

**MANOVA_T**(R1) = *T* **MANOVA_H**(R1) = *H* **MANOVA_E**(R1) = *E*

**Box’s Test**

In the following, R1 is a range in standard formal if *k* = 0 (or is omitted) and R1 is in covariance matrix format if *k* > 0 where *k* = the number of dependent variables (i.e. all the covariance matrices are *k *×* k*).

**BOXTEST**(R1, *k, b*) = p-value**BOXM**(R1, *k*) = *M* statistic**BOXF**(R1, *k*) = *F* statistic**BOXdf1**(R1, *k*) = *df*_{1}**BOXdf2**(R1, *k*) = *df*_{2}**BOX**(R1, *k*) – see below

There is also the following array function

**BOX**(R1, *k*): outputs a 5 × 1 range consisting of the entries BOXM(R1, *k*), BOXdf1(R1, *k*), BOXdf2(R1, *k*), BOXF(R1, *k*) and BOXTEST(R1, *k*) in that order.

When *b* is TRUE (or omitted) BOXTEST(R1, *k, b*) performs the F test, while when it is FALSE the chi-square test is performed instead.

Where R1 and *k* are as described above, the following is an array function

**COVPooled**(R1, *k*) = *k* × *k* pooled covariance matrix

**Extraction **

The following are array functions where range R1 contains data in standard form (with or without column headings) and *s* is a string which presumably specifies a group (in the first column of R1):

**ExtractRows**(R1, *s, b*) = the array which contains all the elements in range R1 for the group labeled *s*. If *b* is set to TRUE, the first row of R1 is included in the output (presumably column headings) even if there is no match. If *b* is omitted it defaults to TRUE.

**ExtractCov**(R1, *s*) = the covariance matrix for the group labeled *s*, based on the data in range R1. If *s* is the empty string “” or is omitted then the pooled covariance matrix of R1 is returned.

For the following array function range R1 contains data in standard form (with column headings) and *s* is a string which specifies a column heading (i.e. an entry in the first row of R1) that selects one of the dependent variables.

**ExtractCol**(R1, *s*) = an array which consists of all the data in R1 for the dependent variable identified by *s*, now organized by columns with one column for each group. You should highlight a range with *g* columns where *g* = the number of groups.

**Factor Analysis**

For a *k* × *k* range R1, the following is an array function:

**VARIMAX**(R1): Produces a *k* × *k* array containing the loading factor matrix after applying a Varimax rotation to the loading factor matrix contained in range R1.

For a *k* × *k* range R1 representing a correlation matrix, the following is an array function:

**ExtractCommunalities**(R1, *iter, prec, eigen*) = the 1 × *k* row vector with the communalities after convergence based on a precision value of *prec* (default = .00001), maximum number of *iter* iteration (default = 25) and maximum number of iterations *eigen* in finding eigenvalues/vectors (default = 100).

**Cluster Analysis**

The following are are array functions:

**CLUST**(R1, *k*, R2) = *m* × 1 column range of cluster numbers 1, 2, …, *k* calculated by the k-means algorithm corresponding to the *m* n-tuple data elements in the *m × n*range R1 where R2 is an *m* × 1 column range containing the initial cluster number assignments. If R2 is omitted then the k-means++ algorithm is used to calculate the initial cluster number assignments.

**ClustAnal**(R1, *k, iter*) = the *m* × 1 column range of cluster numbers produced by CLUST(R1, *k*) with the lowest *SS _{E}* after

*iter*iterations (here each of the iterations uses the k-means++ algorithm). If

*iter*= 0, however, then the

*m*× 1 column range of initial cluster numbers based on the k-means++ algorithm is returned.

**CLUSTERS**(R1, R2) = *m* × 1 column range of cluster numbers 1, 2, …, *k* corresponding to the centroid described in the *n × k* range R2 which is closest to the respective data element in the *m × n* range R1.

**CENTROIDS**(R1, R2) = *n × k* range defining the *k* centroids corresponding to the *m*× 1 column range R2 of clusters for the respective data elements in the *m × n* range R1. Here *k* = the maximum value in R2.

**CENTROIDS**(R1, *k*) =* n × k* range defining the *k* initial centroids corresponding to the data elements in the *m × n* range R1 based on the k-means++ algorithm

The following non-array functions:

**SSE_CLUSTER**(R1, R2, *k*) = *SS _{E}* value for the data in the

*m × n*range R1 based on the cluster assignment in the

*m*× 1 column range which contains cluster number assignments based on

*k*clusters 1, 2, …,

*k*.

**SSE_CENTROID**(R1, R2, *k*) = *SS _{E}* value for the data in the

*m × n*range R1 based on the centroids described in the

*n × k*range R2.

**Jenks Natural Breaks**

**JENKS**(R1,, *lab, iter*) – performs Jenks Natural Breaks optimization on the data in range R1 for* k* classes and outputs a *k*+1 × 3 range, whose first *k* rows contain the left and right endpoints of each of the classes followed by the number of elements in the class. The last row contains the total squared deviation of the *k* classes, followed by the squared deviation for the data in R1 and GVF.

Here *k*+1 = the number of rows in the highlighted output range. If *iter* is included then the algorithm uses *iter* iterations, otherwise all possible partitions are tested.

**JENKS**(R1, *k, lab, iter*) – outputs a a *k* × 1 column range, whose first *k−*1 rows contain the right endpoint of the first *k−*1 classes (the endpoint of the *k*^{th} category is the largest element in R1) and whose last element is the GVF.

**GVF**(R1, R2) = GVF for the right endpoints (breaks) in range R2 based on the data in column range R1.

If *lab* = TRUE (default = FALSE) then an extra column of labels is included.

**Discriminant Analysis**

**LDACoeff**(Rt, *head*): returns an array with the LDA coefficients for the (training) data in Rt consisting of one row for each independent variable whose columns consist of the name of the independent variable, the intercept coefficient and a coefficient for each dependent variable. If *head* = TRUE (default), then the data in Rt contains column headings (corresponding to the names of the dependent variables); these headings are also appended to the output from this function.

**LDAPredC**(R0, Rc, Rp, *lab*): returns an array whose rows contain the probabilities for each vector in the data array R0 (which contains no row/column headings) using the LDA coefficient array in array Rc (without column headings) and prior probabilities in the column array Rp. A column of names of the independent variable that has the highest probability is also appended to the output.

**LDAPred**(R0, Rt, Rp, *lab*) = LDAPredC(R0, LDACoeff(Rt,FALSE), Rp, *lab*), i.e. the predictions for the vectors in R0 based on the LDA model defined by Rt and Rp.

**QDAPred**(R0, Rt, Rp, *lab*): returns an array whose rows contain the probabilities for each vector in the data array R0 (which contains no row/column headings) and the name of the independent variable with the highest probability as for the LDAPred function, except that the QDA model is used instead of the LDA model.

**DAClassification**(Rt, Rp, *linear*): returns a classification for the training data in Rt and priors in Rp. If *linear* = TRUE (default), then the classification table is based on an LDA model, while if *linear* = FALSE then a QDA model is used instead.

**DASummary**(R1): returns a summary of the classification table in range R1

If Rp is omitted then equally probably priors are used. If *lab* = TRUE (default FALSE) then column headings are added to the output.

**Correspondence Analysis**

In the following R1 is an array containing an *m x n* contingency table, R2 is an *mm × n* array containing supplementary row profiles and R3 is an *m × nn* array containing supplementary column profiles. None of these arrays include row or column headings or totals. We also assume that *k* = min(*m, n*) – 1.

**CARowFactors**(R1): returns an *m × k* array in which the *i*^{th} row contains the *i*^{th} row factor vector for R1

**CAColFactors**(R1): returns an *n × k* array in which the *i*^{th} row contains the *i*^{th}column factor vector for R1

**CARowFactors**(R1, R2): returns an *mm × n* array in which the *i*^{th} row contains the row factor vector for the *i*^{th} supplementary row profile in R2 based on the data in R1.

**CAColFactors**(R1, R3): returns an *m × nn* array in which the *i*^{th} row contains the factor vector for the *i*^{th} supplementary column profile in R3 based on the data in R1.

**CAEigen**(R1): returns an* k* *×* 1 column array with the CA eigenvalues for R1

**Linear Algebra**

For a *k* × *k* range R1, the following are array functions:

**eVALUES**(R1, *iter, order*): Produces a 2 × *k* array whose first row contains the eigenvalues *λ* of matrix *A* in range R1. The second row consists of the values det(*A–λI*).

**eVECTORS**(R1, *iter, order*): returns an *n*+3 × *n* range, where *n* = the number of rows/columns in the square range R1. The first row of the output consists of the real eigenvalues of the square matrix *A* corresponding to the data in R1. Below each eigenvalue *λ* in the first row is a unit *n* × 1 eigenvector corresponding to *λ. *In the second-to-last row of the output are the values det(*A−λI*). In the last row of the output, below each eigenvalue *λ* and eigenvector *X* is the value max {*b _{i}*:

*i*= 1 to

*n*} where

*B = AX− λX.*

**eVECT**(R1, *iter, order*): returns an *n*+3 × *n* range of the same format as eVECTORS. Whereas eVECTORS should only be used with symmetric matrices, eVECT can also be used with non-symmetric matrices.

**SCHUR**(R1, *iter, order*): returns matrices* Q* and *T* such that *A = QTQ ^{T}* is a Schur’s factorization of

*A*

**SCHURQ**(R1, *iter, order*): returns only matrix* Q* of the Schur’s factorization of *A*

**SPECTRAL**(R1, *iter*): returns a 2*n × n* range whose top half is the matrix *U* and whose lower half is the matrix *D* in the spectral decomposition of *A* where *A* is the matrix of values in range R1.

**SVD_U**(R1, *iter*) = *U* matrix of the singular vector decomposition (SVD) for the matrix *A* corresponding to range R1, Here *A = UDV ^{T}* where

*U*and

*V*are orthogonal matrices and

*D*is a diagonal matrix.

**SVD_D**(R1, *iter*) = *D* matrix of the SVD for the matrix *A* corresponding to range R1

**SVD_V**(R1, *iter*) = *V* matrix of the SVD for the matrix *A* corresponding to range R1

The argument *iter* specifies the number of iterations used. If omitted it defaults to 100 iterations. If *order* is TRUE or omitted then the eigenvalues are listed in order from highest in absolute value to smallest. If *order* is FALSE then they are listed in order from highest to lowest.

**MSQRT**(R1): Produces a *k* × *k* array which is the square root of the matrix represented by range R1

For an *m* × *n *range R1

**QRFactorR**(R1, *prec*): Produces the *n* × *n* array* R* for which *A = QR* where *A* is the matrix in R1. Note that to obtain *Q* you use the fact that *AR*^{-1}; i.e. *Q *is obtained using the formula =MMULT(R1, MINVERSE(R2)) where range R2 contains the formula QRFactorR(R1). The function treats numbers smaller than *prec* as if they were zero. If *prec* is omitted it defaults to 0.

**QRFactor**(R1, *prec*): Produces an *m+n* × *n* array. The first *m* rows of the output is *Q *and the next *n* rows of the output is *R* where *A = QR* and *A* is the matrix in range R1. The function treats numbers smaller than *prec* as if they were zero. If *prec* is omitted it defaults to 0.

**MRANK**(R1, *prec*) = the rank of the matrix specified in range R1. The function treats numbers smaller than *prec* as if they were zero. If *prec* is omitted it defaults to .00001.

**QRSolve**(R1, R2) – assuming R1 is an *m* × *n* range describing matrix *A* and R2 is an *m* × 1 range describing the column vector *C*, QRSolve outputs an *n* × 1 column vector *X* containing the solution to *AX = C*

For an *k* × *k* range R1

**CHOL**(R1) = the lower triangular *k* × *k* matrix *L* such that *LL ^{T}* is the Cholesky decomposition of the matrix in R1.

**QRInverse**(R1) = inverse of the matrix described by range R1 using QR Factorization

**Mahalanobis Distance**

**MDistSq**(R1, R2) = the Mahalanobis distance squared between the 1 × *k* row vector R2 and the mean vector of the sample contained in the *n* × *k* range R1

**QuadForm**(R1, R2, R3) = the Mahalanobis distance squared between the 1 × *k* row vector R1 and the 1 × *k *mean vector based on the 1 × *k *population covariance matrix in R3

please help me to download this file, i need multivariate analysis tutorial.

thanks

Hi Nisak,

The Real Statistics website includes a tutorial on multivariate statistical analysis. Please let me know which topics are missing or what additional information do you need.

Charles

Hi Charles;

Please forgive my newbie questions. I want to access the Real Statistics Multivariate Functions to analyze some data I have. Can I access all the functions using the Control M plus follow the menu, or do I need to go to the example workbooks for any of them?

Also, if I copy one of the multivariate functions example workbooks and then paste it into a blank workbook will it still be active and link to the Resource Pack?

Thanks.

Joel,

Once you have downloaded and installed the Real Statistics Resource pack you can use any of the multivariate functions and data analysis tools (even if you have never downloaded the examples workbook). Simply press Ctrl-m (small m) to access any of the data analysis tools. Functions are used just like any Excel functions.

The Real Statistics functions and tools are accessible on any Excel workbook, including the Multivariate Examples Workbook. For this workbook you will need to change the link to the resource pack as explained on the Download Examples Workbooks webpage.

Charles

Dear Charles,

Thanks for nice tool. I found that what I need. But while installing I did not find realstats in Add-ins. Please help me.

Thanks,

Odko

Odko,

Once you have downloaded the Real Statistics Resource Pack file, you need to press Alt-TI and then press the Browse button to locate where you have stored the Real Statistics Resource Pack file. Once this is done the file should then appear on the list of add-ins.

Charles

Hi there

I have downloaded the resource pack and it appears on the toolbar bur I cannot find the multivariate tools anywhere. My spreadsheet doesn’t recognise them.

Can you tell me what I am doing wrong please?

Thank you in advance

Derek

Derek,

What do you see when you enter the formula =VER()

When you press Ctrl-m, do you see Survival Analysis on the list of data analysis tools?

Charles

Hi there

I get 3.5.3 but I do not have a list of data analysis tools anywhere.

I am using an iMac with excel 2011.

I have the StatsPlus installed.

I cannot apparently instal the analysis tool pack.

I cannot see Survival Analysis on Real Statistics data analysis tools.

I am lost!

Derek

Derek,

At present Rel 3.5.3 is indeed the latest release for the Mac. This release doesn’t include Survival Analysis. You would need the Windows version of the Real Statistics software to use Survival Analysis. To get a list of data analysis tools just press Cmd-m to bring up the main dialog box.

Charles

Hi Charles

Does that mean I can’t use the multivariate such as BNORMRAND to generate data?

Thank you for your help

Derek

Derek,

No, you can use BNORMRAND to generate a random sample which conforms to a binormal distribution.

Charles

Charles,

Is there a MANCOVA in the analysis resource pack…… ?

I am not seeing it……

Eric,

Sorry, but I don’t support MANCOVA yet.

Charles

Charles,

Thanks for letting me know on the MANCOVA …. I hope it is a tool that is on a future update. Your work is very helpful. I enjoy using your resource pack and look forward to your future work……

All The Best,

Eric

Dear Charles,

Wonderful website and data analysis tool! Thank you so very much!

I have one question: I am doing factor analysis and source contributions, for which I have used other software tools in the past (SPSS, StatistiXL, XLSTAT) and with those tools an essential matrix (the Casewise Factor Scores) is produced with the Factor Analysis tool. However, Real Statistics Resource Tool only produces what I’ve always understood to be the Factor Loadings, but labels it Factor Scores matrices (three different algorithms).

How do I obtain the Scores matrix , i.e. the factor associated with each object?

Also, There does not appear to be a tool for Principal Components Analysis, which is an important exploratory tool for me. Is it hidden in another module somewhere?

Many thanks,

Marietjie

Marietjie,

Glad you like the website and data analysis tools.

1. If you look at the following webpage, Figure 5 contains the Factor Loadings, while Figure 9 and 10 contain the Factor Scores

http://www.real-statistics.com/multivariate-statistics/factor-analysis/real-statistics-support-factor-analysis/

2. Yes, you are correct. I give an example of how to conduct Principal Components Analysis in Excel, but I have never added a data analysis tool for this. I will correct this in one of the next releases.

Charles