# Real Statistics Functions

Although all the statistical analyses described in this website can be done with standard Excel capabilities (see Built-in Excel Functions and Built-in Statistical Functions), it is often easier to use the supplemental functions and data analysis tools provided in the Real Statistics Resource Pack. You use the supplemental functions exactly as for the standard Excel functions (see Worksheet Functions).

We now briefly describe the various Real Statistics supplemental functions. More complete descriptions of these functions are provided elsewhere in this website.

Also note that if you enter =function-name into any cell and press Ctrl-Shift-A the arguments for the named function will be displayed. E.g. =LogitCoeff + Ctrl-Shift-A will display

See Real Statistics Multivariate Functions for a summary of the supplemental multivariate statistics functions available in the Real Statistics Resource Pack.

See Real Statistics Time Series Analysis Functions for a summary of the supplemental multivariate statistics functions available in the Real Statistics Resource Pack.

See Real Statistics Data Analysis Tools for a list of the supplemental statistical data analysis tools provided by the Real Statistics Resource Pack.

Basic descriptive statistics

 MEAN(R1) same as AVERAGE(R1) IQR(R1, b) interquartile range of data in range R1; if b = TRUE then use QUARTILE while if b = FALSE then use QUARTILE.EXC (Excel 2010/2011/2013 only) MAD(R1) median absolute deviation of the data in range R1 RNG(R1) range of data in range R1, i.e. MAX(R1) – MIN(R1) SKEWP(R1) population skewness for the data in R1 KURTP(R1, excess) population kurtosis for the data in R1; if excess = TRUE (default) then 3 is subtracted SUMCOL(R1) array function which returns a row range with sums of each column in R1 MEANCOL(R1) array function which returns a row range with means of each column in R1 VARCOL(R1) array function which returns a row range with sample variances of each column in R1 STDEVCOL(R1) array function which returns a row range with sample standard deviations of each column in R1 SUMROW(R1) array function which returns a column range with sums of each row in R1 MEANROW(R1) array function which returns a column range with means of each row in R1 VARROW(R1) array function which returns a column range with sample variances of each row in R1 STDEVROW(R1) array function which returns a column range with sample standard deviations of each row in R1 STDERR(R1) standard error of the data in range R1, i.e. STDEV(R1)/SQRT(COUNT(R1)) RANK_AVG(x, R1, order, num_digits) rank of x in range R1 taking ties into account; order = 0 (default) indicates decreasing order and a non-zero value indicates increasing order; equivalent to Excel’s RANK.AVG(x, R1, order) except that the data is first rounded off to num_digit decimal places. Ties_Correction(R1, R2, type) ties correction value for the data in range R1 and optionally range R2, where type = 0: one sample, type = 1: paired sample, type = 2: independent samples TRIMDATA(R1, p) array function which returns a column range equivalent to R1 after removing the lowest and highest p/2 % of the data values. WINSORIZE(R1, p) array function which returns a column range that is the Winsorized version of R1 removing the lowest and highest p/2 % of the data values. WINMEAN(R1, p) Winsorized mean of the data in range R1 removing the lowest and highest p/2 % of the data values. STANDARD(x, R1, type, exc) =STANDARD(x, AVERAGE(R1), STDEV(R1) if type = 0 (default); =(x−Q3)/IPR if type = 1 and x > Q3; =(x−Q1)/IPR if type = 1 and x < Q1; = 0 otherwise. IPR = IPR(R1,exc), Q1 = QUARTILE(R1,1) and Q3 = QUARTILE(R1,3) if exc = FALSE (default) and Q1 = QUARTILE.EXC(R1,1) and Q3 = QUARTILE.EXC(R1,3) if exc = FALSE

Count measurements

 COUNTU(R1) count of the number of unique numeric values in R1 COUNTAU(R1) count of the number of unique non-empty cells in R1 COUNTCOL(R1) array function which returns a row range with the number of numeric values in each column of R1 COUNTROW(R1) array function which returns a column range with the number of numeric values in each row of R1 CountRowsUnique(R1, head, ncols) count of the number of unique rows in R1. If head = TRUE (default FALSE) then first row of R1 (presumably a heading) is not counted; the last ncols columns of R1 are not considered when determining uniqueness (ncols = 0). CountFullRows(R1, b) number of rows in range R1 which don’t have any empty cells if b = TRUE (default) and the number of rows in R1 which don’t have any non-numeric cells if b = FALSE CountPairs(R1, R2, b) number of pairs of cells from R1 and R2 for which neither pair is empty if b = TRUE (default) and number of pairs of cells from R1 and R2 for which neither pair is non-numeric if b = FALSE

T distribution

 CONFIDENCE_T(alpha,  s, size) the value k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value, standard deviation s and sample size, assuming a t distribution T_CONF(R1, alpha) the value k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution T_LOWER(R1, alpha) the value x̄ − k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution T_UPPER(R1, alpha) the value x̄ + k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution DF_POOLED(R1, R2) degrees of freedom for the two sample t test for samples in ranges R1 and R2, especially when the two samples have unequal variances STDERR_POOLED(R1, R2, b) pooled standard error for two sample t test for samples in ranges R1 and R2; if b = TRUE assume equal variances, while if b = FALSE assume unequal variances STDEV_POOLED(R1, R2) pooled standard deviation for the two sample t test for samples in ranges R1 and R2 when the two samples have equal variances VAR_POOLED(R1, R2) pooled variance for the two sample t test for samples in ranges R1 and R2 when the two samples have equal variances T_DIST(x, df, cum) value of the cumulative distribution function of the t distribution with df degrees of freedom at x when cum = TRUE, and value of the pdf of the  distribution with df degrees of freedom at x when cum = FALSE

T_DIST is like Excel’s T.DIST function except that df doesn’t have to be an integer. In addition, the following functions are available which have a similar advantage over their Excel counterpart: T_DIST_RT, T_DIST.2T, T_INV, T_INV.2T.

There are also the following array formulas for identifying outliers:

 GRUBBS(R1, lab, alpha) outputs a 4 × 1 column range with the following entries: one potential outlier, G-stat, G-crit  and test significance ESD(R1, lab, alpha) outputs a 4 × k range where k = max #  of potential outliers and each column contains: a potential outlier, G-stat, G-crit  and test significance (based on ESD test) OUTLIERS(R1, alpha) outputs a column range with up to k outliers (based on the ESD test)

If lab = TRUE (default = FALSE) then an extra column is added with labels

Normal distribution

 NORM_CONF(R1, alpha) the value k such that x̄ + k is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution NORM_LOWER(R1, alpha) the value x̄ − k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution NORM_UPPER(R1, alpha) the value x̄ + k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution SHAPIRO(R1, b) the Shapiro-Wilk test statistic W for the data in the range R1 SWTEST(R1, b, h) the p-value of the Shapiro-Wilk test on the data in range R1 SWCoeff(n, k, b) the kth Shapiro-Wilk coefficient for samples of size n SWCoeff(R1, C1, b) the Shapiro-Wilk coefficient for cell C1 within sorted sample in range R1 JARQUE(R1, pop) the Jarque-Barre statistic for the data in range R1 JBTEST(R1, pop) the p-value of the Jarque-Barre test on the data in range R1 SKEWTEST(R1, lab, alpha) array function to test whether the skewness of the data in range R1 is zero (consistent with a normal distribution); output contains sample skewness, standard error, test statistic, p-value and 1–alpha confidence interval KURTTEST(R1, lab, alpha) array function to test whether the kurtosis of the data in range R1 is zero (consistent with a normal distribution); output contains sample kurtosis, standard error, test statistic, p-value and 1–alpha confidence interval DAGOSTINO(R1) the D’Agostino-Pearson statistic for the data in range R1 DPTEST(R1) the p-value of the D’Agostino-Pearson test on the data in range R1

In the above functions, when b = TRUE (default) the Royston algorithm is used and when b = FALSE the original Shapiro-Wilk algorithm is used (employing harmonic interpolation unless h = FALSE when linear interpolation is used).

Chi-square distribution

 CHI_STAT2(R1, R2) Pearson’s chi-square statistic for observation values in range R1 and expectation values in range R2 CHI_MAX2(R1, R2) maximum likelihood chi-square statistic for observation values in range R1 and expectation values in range R2 CHI_STAT(R1) Pearson’s chi-square statistic for observation values in range R1 CHI_MAX(R1) maximum likelihood chi-square statistic for observation values in range R1 CHI_TEST(R1) p-value for Pearson’s chi-square statistic for observation values in range R1 CHI_MAX_TEST(R1) p-value for maximum likelihood chi-square statistic for observation values in range R1 CHISQ_DIST(x, df, cum) value of the cdf of the chi-square distribution with df degrees of freedom at x when cum = TRUE, and value of the pdf of the chi-square distribution with df degrees of freedom at x when cum = FALSE (df does not need to be an integer) CHISQ_INV(p, df) inverse of the chi-square distribution with df degrees of freedom at p (df does not need to be an integer) FISHERTEST(R1, tails, slow) probability calculated by the Fisher exact test for the contingency table in range R1 where for 2 × 2 tables tails = the number of tails: 1 (one-tail) or 2 (two tail, default), tails = 2 for other tables; slow is used to increase the acceptable sum of counts in R1. FISHER_TEST(R1, lab, slow) array function that returns a column array containing: p-value for the two-tailed Fisher’s exact test for the data in R1, sample size, df, chi-square statistic, Cramer’s V and w effect size. If lab = TRUE (default FALSE), then an extra column of labels is appended; slow is used to increase the acceptable size of R1.

CHISQ_DIST and CHISQ_INV are like Excel’s CHISQ.DIST and CHISQ.INV functions except that df doesn’t have to be an integer. In addition, the following functions are available which have a similar advantage over their Excel counterpart: CHISQ_DIST_RT and CHISQ_INV.RT.

Covariance and Correlation

 COVARP(R1, R2) population covariance of the populations defined by ranges R1 and R2; equivalent to COVAR(R1, R2) COVARS(R1, R2) sample covariance of the samples defined by ranges R1 and R2; equivalent to COVARIANCE.S(R1, R2) CORREL_ADJ(R1, R2) estimated population correlation coefficient corresponding to the sample data in ranges R1 and R2 MCORREL(R, R1, R2) multiple correlation of dependent variable z with x and y where the samples for z, x and y are the ranges R, R1 and R2 respectively PCORREL(R1, i, j) partial correlation coefficient between the ith and jth columns in R1 SCORREL(R1, R2, lab, tails) array function which returns Spearman’s rank correlation (rho), t-stat and p-value for the data in R1 and R2 KCORREL(R1, R2, lab, tails, α, ties) array function which returns Kendall’s tau, standard error, z-stat, z-crit, p-value, lower and upper bounds of 1–α confidence interval of Kendall’s tau for the data in R1 and R2 COV(R1, b) array function which returns the sample covariance matrix for the data in R1 COVP(R1, b) array function which returns the population covariance matrix for the data in range R1 CORR(R1, b) array function which returns the correlation matrix for the data in range R1 PCORR(R1) array function which returns the partial correlation matrix for the data in range R1

For CORR, COV and COVP if b = TRUE (default) then any row in R1 which contains a non-numeric cell is not used, while if b = FALSE then correlation/covariance coefficients are calculated pairwise (by columns) and so any row which contains non-numeric data for either column in the pair is not used to calculate that coefficient value.

For KCORREL, if lab = TRUE then the output takes the form of a 7 × 2 range with the first column consisting of labels, while if lab = FALSE (default) the output takes the form of a 7 × 1 range without labels. If ties = TRUE (default) then a ties correction is applied.

Similarly SCORREL returns a 3 × 2 range if lab = TRUE and a 3 × 2 range if lab = FALSE.

The following are functions for correlation coefficient hypothesis testing:

 CorrTTest(ρ, r, n, tails) p-value for one sample t test of the correlation coefficient ρ = 0 CorrTLower(r, n, α) lower bound of 1–α confidence interval of the population correlation coefficient CorrTUpper(r, n, α) upper bound of 1–α confidence interval of the population correlation coefficient CorrelTTest(r, n, α, lab, tails) array function which returns the test statistic z, p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient  based on a hypothetical correlation ρ = 0 CorrelTTest(R1, R2, α, lab, tails) CorrelTest(r, n, ρ, α, lab, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) CorrTest(ρ, r, n, tails) p-value for one sample test of the correlation coefficient CorrLower(r, n, α) lower bound of 1–α confidence interval of the population correlation coefficient CorrUpper(r, n, α) upper bound of 1–α confidence interval of the population correlation coefficient CorrelTest(r, n, ρ, α, lab, tails) array function which returns the test statistic z, p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient CorrelTest(R1, R2, ρ, α, lab, tails) CorrelTest(r, n, ρ, α, lab, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) CorrETest(r, n, ρ, tails) p-value for one sample exact test of the correlation coefficient ρ CorrETest(R1, R2, ρ, tails) CorrETest(r, n, ρ, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) CorrelETest(r, n, ρ, lab, α, tails) array function which returns the test statistic p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient based on hypothetical correlation ρ CorrelETest(R1, R2, ρ, lab, α, tails) CorrelETest(r, n, ρ, lab, α, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) Correl2Test(r1, n1, r2, n2, α, lab) array function which returns the two independent sample test statistic z, p-value (two-tailed test) and lower and upper bounds of 1–α confidence interval of the difference of population correlation coefficients ρ1– ρ2, sample correlations r1 and r2, and sample sizes n1 and n2 Correl2Test(R1, R2, R3, R4, α, lab) Correl2Test(r1, n1, r2, n2, α, lab) where r1 = CORREL(R1, R2), r2 = CORREL(R3, R4), n1 = CountPairs(R1,R2) and n2 = CountPairs(R3,R4) Correl2OverlapTTest(r12, r13, r23, n, α, lab) array function for the two overlapping dependent sample test that returns the |r12-r13|, the t test statistic, p-value (two-tailed test) and lower and upper bounds of 1–α confidence interval for the difference of population correlation coefficients; n = size of all three samples. Corr2OverlapTTest(R1, R3, R2, α, lab) Correl2OverlapTTest(r12, r13, r23, n, α, lab) where r12 = CORREL(R1, R2), r13 = CORREL(R1, R3), r23 = CORREL(R2, R3) n = COUNT(R1) Correl2OverlapTest(r12, r13, r23, n, α, lab) array function which is just like Correl2OverlapTTest except that the Fisher transformation is used and the output consists of |r12-r13| and lower and upper bounds of 1–α confidence interval for the difference of population correlation coefficients. Corr2OverlapTest(R1, R3, R2, α, lab) Correl2OverlapTest(r12, r13, r23, n, α, lab) where r12 = CORREL(R1, R2), r13 = CORREL(R1, R3), r23 = CORREL(R2, R3) n = COUNT(R1) Correl2NonOverlapTest(R1, n1, n2, α, lab) array function which returns |r12-r34| and lower and upper bounds of 1–α confidence interval for the difference of population correlation coefficients, where R1 contains the correlation matrix for 4 samples and n1 = size of sample 1 (or 2) and n2 = size of sample 3 (or 4) Corr2NonOverlapTest(R1, α, lab) Correl2NonOverlapTest(R2, n1, n2, α, lab) where R2 = CORR(R1), n1 =# of elements in the 1st (or 2nd) column of R1 and n2 = # of elements in the 3rd (or 4th) column of R1 PCRIT(n, α, tails) critical value for the Pearson’s correlation coefficient based on two samples of size n with alpha of α (default .05) and tails = 1 or 2 (default)

For CorrelTTest, CorrelTest and Correl2Test if lab = TRUE then the output takes the form of a 4 × 2 range with the first column consisting of labels, while if lab = FALSE (default) the output takes the form of a 4 × 1 range without labels. Similarly, a column of labels is appended to the output for all the other functions when lab = TRUE (default FALSE).

The CorrTest, CorrLower, CorrUpper, CorrelTest and Correl2Test functions are based a normal test using a Fisher transformation. The CorrTTest, CorrTLower, CorrTUpper and CorrelTTest functions are based on a t-test where the population correlation is assumed to be zero.

r = sample correlation, ρ = hypothetical population correlation, n = sample size and tails = the # of tails: 1 or 2 (default).

Box-Cox Transformation

 BOXCOX(R1, λ) outputs a range containing the Box-Cox transformation of the data in range R1 using the given λ value. If λ is omitted, then the transformation which best normalizes the data in R1 is used based on maximizing the log-likelihood function. BOXCOXLL(R1, λ) log likelihood function of the Box-Cox transformation of the data in R1 using the given lambda value BOXCOXLambda(R1) value of lambda which maximizes the log likelihood function of the Box-Cox transformation of the data in R1

Linear Regression

The following are ordinary, non-array functions where R1 contains the X data and R2 contains the Y data:

A second R2 parameter can be used with each of the df functions above, although this parameter is not used. Similarly you can use SSRegTot(R1, R2) and its value will be equivalent to SSRegTot(R2). All these functions can optionally take a third argument con, where con = TRUE (default) means that the regression model takes a constant term and con = FALSE means that the regression model doesn’t have a constant term.

There is also a second form of the RSquare function in which RSquare(R1, k) = R2 where the X data consist of all the columns in R1 except the kth column and the Y data consist of the kth column of R1.

The following are array functions where R1 contains the X data and R2 contains the Y data.

 DESIGN(R1) design matrix for the data in R1 HAT(R1, con) hat matrix for the data in R1 DIAGHAT(R1, con) diagonal of the hat matrix for the data in R1 CORE(R1) core of the hat matrix for the data in R1 LEVERAGE(R1, con) leverage vector = diagonal of hat matrix for the data in R1 RegCov(R1, R2) covariance matrix for the regression coefficients of the regression line RegCoeff(R1, R2, con) two column range with the regression coefficients for the regression line in the first column and the corresponding standard errors in the second column RegPred(R0, R1, R2, lab, alpha) 7 × 1 column range containing the predicted y value for the data in R0, the standard error for the confidence interval, the lower and upper ends of the 1–alpha confidence interval, the standard error for the prediction interval, the lower and upper ends of the 1–alpha prediction interval (alpha defaults to .05) RRegCoeff(R1, R2, hc, con) two column range with the regression coefficients for the regression line in the first column and the corresponding robust standard errors in the second column, where hc = 0 through 4 corresponding to HC0 through HC4 WRegCoeff(R1, R2, R3) two column range with the regression coefficients for the weighted regression line in the first column and the corresponding standard errors in the second column, where R3 contains the weights RegCoeffSE(R1, R2) vector with the standard errors of the coefficients for the regression line RegY(R1, R2, con) vector of predicted values for Y based on the regression line = TREND(R2,R1) RegE(R1, R2, con) vector of residuals based on the regression line RegStudE(R1, R2) vector of studentized residuals based on the regression line SHAPLEY(R1, R2) vector with the Shapley-Owen decomposition of R2 SlopesTest(R1, R2, R3, R4, b, lab) vector containing s.e. of differences between slopes, t, df and p-value where R3 and R4 are the X and Y values for a second regression line; if b = TRUE (default) the pooled s.e. is used; if lab = TRUE then a column of labels is added to the output (default = FALSE)

The following are non-array functions where R1 contains the X data and R2 contains the Y data.

 RegPredC(R0, Rc) predicted y values for x values in range R0 based on the regression coefficients in range Rc. R0 and Rc can be column or rows ranges. RSquareTest(R1, R3, R2) p-value of the test of the significance of X data in R3 (reduced model) vs. X data in R1 (full model) RegAIC(R1, R2, con) Akaike’s Information Criterion (AIC) for the regression model RegAICc(R1, R2, con) corrected AICc for the regression model RegSBC(R1, R2, con) Schwarz Baysean Criterion (SBC) for the regression model TOLERANCE(R1, j) Tolerance of the jth variable for the data in range R1 VIF(R1, j) VIF of the jth variable for the data in range R1 DURBIN(R1) Durbin-Watson statistic d where R1 is a column vector containing residuals DURBIN(R1, R2) Durbin-Watson statistic d where R1 contains X data and R2 contains Y data

The following array functions are also supported:

 DURBIN(R1, lab, alpha) Outputs the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 is a column vector containing residuals DURBIN(R1, R2, lab, alpha) Outputs the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 contains X data and R2 contains Y data

If lab = FALSE (default) then the output is a 4 × 1 column range, while if lab = TRUE then the output is a 4 × 2 range with an extra column of labels.

The following array functions are used to create a stepwise regression model. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values and R3 is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise.

 RegRank(R1, R2, R3) returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by R1, R2, R3 RegCoeffP(R1, R2, R3) returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by R1, R2, R3 RegStepwise(R1, R2) returns a 1 × k array R where each non-blank elements in R corresponds to an x variable that should be retained in the stepwise regression model. Actually the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model.

Exponential Regression

The following are array functions that support a nonlinear exponential regression model:

 ExpCoeff(R1, R2, iter, lab) outputs a 2 × 4 range whose first column contains the coefficients α and β for the regression, whose 2nd column contains the corresponding standard errors for these coefficients, whose 3rd column contains SSE and MSE and whose 4th column contains MSReg and dfT. If lab = TRUE then and extra row is added with labels (default = FALSE). ExpPred(R, R1, R2, iter) outputs an m × 1 column range with the values predicted by the exponential model for R1 and R2 based on the data in the m × 1 column vector of x values. ExpPredC(R, α, β) outputs an m × 1 column range with the values predicted by the exponential model with coefficients α and β based on the data in the m × 1 column vector R of x values.

Here iter = the number of iterations (default 20). The last two functions can also be used as non-array functions of the following form:

 ExpPred(x, R1, R2, iter) value predicted by the exponential model for x based on the data in R1 and R2 ExpPredC(x, α, β) value predicted by an exponential model with coefficients α and β for x

Polynomial Regression

The following are functions that support polynomial regression. The first two functions are array functions. R1 and R2 are column arrays containing x and y data values respectively and deg is the degree/order of the polynomial.

 PolyDesign(R1, deg, ones) returns an array consisting of x, x2, …, xdeg columns. If ones = TRUE, then the output is 1, x, x2, …, xdeg PolyCoeff(R1, R2, deg) returns a two column array consisting of the polynomial regression coefficients and their standard errors PolyRSquare(R1, R2, deg) R-square value for the polynomial regression PolyDegree(R1, R2, deg) the highest degree polynomial ≤ deg which produces a significantly different R-square value

The following are array functions which support LAD regression. R1 is an n × k array containing x data values, R2 is an n × 1 array containing y data values, con takes the value TRUE if the regression includes a constant term and iter is the number of iterations used in the iteratively reweighted least squares algorithm (default = 25).

 LADRegCoeff(R1, R2, con, iter) column array containing the LAD regression coefficients, k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE LADRegWeights(R1, R2, con, iter) n × 1 column array consisting of the weights calculated from the iteratively reweighted least squares algorithm LADRegCoeffSE(R1, R2, con, iter, nboots) column array consisting of the standard errors of the LAD regression coefficients based on bootstrapping nboots times; k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE

Deming Regression

The following are functions which support Deming regression, all but the last function are array functions. R1 is an array containing x data values, R2 is an array containing y data values and λ is the lambda value (in which case R1 and R2 contain one column) or omitted (in which case lambda is calculated from R1 and R2). If lab = TRUE (default is FALSE) then the output contains an extra column of labels, except for DRegResiduals where lab = TRUE means that the output contains an extra row of labels.

 DRegCoeff(R1, R2, λ, lab) 2 × 2 array containing the Deming  regression intercept and slope coefficients along with the standard errors of these coefficients DRegResiduals(R1, R2, λ, lab) n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 and R2 where n = the number of rows in R1 (or R2). DRegIdentity(R1, R2, λ, lab) 2 × 1 array consisting of x̄–ȳ and se(x̄–ȳ) for Deming regression DRegPred(xo, R1, R2, λ, alpha, lab) 4 × 1 array consisting of the predicted value of y for x0, the standard error of the prediction and the confidence interval for this prediction based on Deming regression DRegLambda(R1, R2) lambda value for Deming regression calculated from R1 and R2.

Total Least Squares (TLS) Regression

The following are array functions which support TLS regression.

 TRegCoeff0(R1, R2, lab) 2 × 1 column array consisting of the intercept and slope TLS regression coefficients; if lab = TRUE then an extra column of labels is appended (default FALSE) TRegCoeff(R1, R2, iter) k+1 × 1 column array consisting of the TLS regression coefficients, where k = # of columns in R1 and iter = # of iterations in SVD algorithm

Logistic Regression

If lab = TRUE then the output includes headings and if lab = FALSE (the default) only the data is outputted. Except as indicated above, if head = TRUE (default) then R1 and the output include column headings, while if head = FALSE then R1 and the output contain only data.

If raw = TRUE then the data in R1 is in raw form and if raw = FALSE (the default) then the data in R1 is in summary form. The parameter alpha is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter iter determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of head is FALSE.

Probit Regression

 ProbitCoeff(R1, lab, raw, head, alpha, iter) array function that returns probit regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. ProbitCoeff2(R1, R2, lab, head, alpha, iter) array function like ProbitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis. ProbitCoeffs(R1, iter) array function like ProbitCoeff that outputs the coefficients plus the number of iterations made ProbitTest(R1, lab, raw, iter) array function that returns LL, LL0, chi-square and p-value for data in R1 ProbitRSquare(R1, lab, raw, iter) array function that returns LL, LL0, pseudo R-square, AIC, BIC  for data in R1 ProbitPred(R0, R1, raw, iter) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression model calculated from the data in R1 (without headings) ProbitPredC(R0, R2) outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression coefficients contained in R2 (in the form of a column vector)

The arguments lab, raw, iter, head and alpha are as for the corresponding logistic regression function.

Multinomial and Ordinal Logistic Regression

The following are array functions where R1 is the data used to create the multinomial logistic regression model. When r = 0 (default) then the data in R1 is in raw form, whereas if r ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, r.

 MLogitCoeff(R1, r, lab, head, iter) returns the coefficients for data in range R1. MLogitParam(R1, r, lab, head, alpha, iter) returns the coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. MLogitTest(R1, r, lab, iter) returns LL, LL0, chi-square and p-value for data in R1 MLogitRSquare(R1, r, lab, iter) returns LL, LL0, pseudo R-square, AIC, BIC  for data in R1 MLogitPred(R0, R1,r, iter) returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression model calculated from the data in R1 (without headings) MLogitPredC(R0, R2) returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2 MLogitSummary(R1, r, head) returns a summary of the raw data in range R1. MLogitSelect(R1, s, head) array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output. MLogitExtract(R1, r, s, head) fills the highlighted range with the columns defined by string s from the data from R1. The string s takes the form of a comma delimited list of numbers 0, …, r. MLogit_Accuracy(R1, r, lab, head, iter) fills the highlighted range with a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. If R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE).

Here lab, headalpha and iter are as for the logistic regression functions described above. The following array function is used with ordinal logistic regression models:

 OLogitPredC(R0, R2) returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2

Poisson Regression

 PoissonCoeff(Rx, Ry, lab, phi, Rt, head, alpha, iter, guess) array function that returns Poisson regression coefficients and other parameters (s.e., Wald stat, p-value and confidence interval) for data in range Rx, Ry and Rt. If head = TRUE, then Rx, Ry and Rt contain column headings (default FALSE). If phi = TRUE (default FALSE) then the phi correction is applied to the standard errors. PoissonCov(Rx, Ry, Rt, iter, guess) array function that contains the coefficient covariance matrix PoissonPred(Rx0, Rx, Ry, lab, Rt0, Rt, alpha, iter, guess) array function that returns a column array containing the predictions for Rx0 and Rt0 based on the Poisson model based on the data in Rx, Ry and Rt PoissonPredC(Rx0, Rc, Rt) array function that returns a column array with the predictions for Rx0 and Rt0 based on the Poisson coefficients in Rc. PoissonPredCC(Rx0, Rc, Rv, lab, Rt0, alpha) array function that returns an array with 4 columns containing the predictions, standard errors and confidence intervals for Rx0 and Rt0 based on the Poisson coefficients in Rc and coefficient covariance array Rv.

Rx contains the X range data, Ry contains the Y range data and Rt contains the frequency range data. If Rt or Rt0 is omitted it defaults to a column array of ones. Rt or Rt0 can also be a numeric value, in which case it is treated as a column array containing this numeric value.

If lab = TRUE, then an extra column is appended to the output containing labels, except for PoissonPredCC where an extra row of labels is appended. alpha is the significance level (default .05). iter is the number of iterations used in calculating the coefficients using Newton’s method (default 20). guess is the initial guess of these coefficients (if missing then all the coefficients are initially set to one).

Survival Analysis

 LOGRANK(R1, R1, lab) array function which returns the following statistics along with their p-value: log-rank 1, log-rank 2, Wilcoxon, Tarone-Ware COXEST(R1, approx, iter) array function which returns Cox regression coefficients, their standard errors, convergence values, LL1 and LL0 values and covariance matrix COXPRED(R1, R2, R0, lab, approx, iter, alpha) array function which predicts the hazard ratio between the two subject profiles in R1 and R2 (plus standard error and 1−alpha confidence interval) based on a Cox regression model derived from the input data in  R0.

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted. The approx parameter takes the value: 0 if the continuum approximation is used and 1 (default) if the Breslow approximation is used. The output for COXEST and COXPRED is calculated using Newton’s Method with iter iterations (default = 20).

ANOVA and related functions

The following functions are used for one factor ANOVA with replication where R1 = the input data in Excel format

Here b is an optional argument. When b = TRUE (default) then the columns denote the groups, while when b = FALSE, the rows denote the groups.

If R1 is in standard (i.e. stacked) format, then you can use the following formulas, where the colth column contains the data for the one-way ANOVA (default col = 2):

SSWStd(R1, col) = SSW         SSBetStd(R1, col) = SSBet        SSTotStd(R1, col) = SSTot

The following functions are used for two factor ANOVA where R1 = the input data in Excel format and r = the number of rows in R1 that make up an A factor level.

The second argument for the column and interaction terms is optional and can be dropped. Note that the column and total terms are identical to the between groups and total terms, respectively, for one factor ANOVA.

 ANOVARow(R1,r) = MSA/MSW ATESTRow(R1,r) = p-value of A factor ANOVACol(R1,r) = MSB/MSW ATESTCol(R1,r) = p-value of B factor ANOVAInt(R1,r) = MSAB/MSW ATESTInt(R1,r) = p-value of AB factor

The following array functions are used to convert data between Excel’s formatting for Anova and standard format:

 StdAnova1(R1) converts data in R1 in standard format into Excel Single Factor Anova format Anova1Std(R1) converts data in R1 in Excel Single Factor Anova format into standard format StdAnova2(R1) converts data in R1 in standard format into Excel Two Factor Anova format Anova2Std(R1, r) converts data in R1 in Excel Single Factor Anova format with r rows per group into standard format Anova3Rows(R1) converts data in R1 in standard Three Factor Anova columns format into Three Factor Anova rows format Anova3Cols(R1) converts data in R1 in standard Three Factor Anova rows format into Three Factor Anova columns format StdNested(R1) converts data in R1 in standard format into Excel Nested Anova format

The following array functions are used to perform ANOVA via regression:

 SSAnova2(R1, r) returns a column array with SSRow, SSCol, SSInt and SSW for a two factor ANOVA for the data in R1 using a regression model; if r > 0 then R1 is assumed to be in Excel Anova format with r rows per sample, while if r = 0 or is omitted then R1 is assumed to be in standard format; data is w/o headings SSAnova3(R1) returns a column array with SSA, SSB, SSC, SSAB, SSAC, SSBC, SSABC and SSW for a three factor ANOVA for the data in R1 using a regression model where the data in R1 is assumed to be in standard format by columns w/o column headings

The following functions are related to ANOVA:

 LEVENE(R1, type) p-value of Levene’s test for the data in range R1 (organized by columns) where type = 0 for deviations from group means, type = 1 for deviations from group medians and type = -1 for deviations from 10% trimmed group means FKTEST(R1) p-value of Fligner-Killeen test for data in range R1 (organized by columns) GGEpsilon(R1, ngroups, raw) Greenhouse and Geisser epsilon value for the data in range R1 where ngroups = the number of groups; if raw = TRUE then R1 contains raw data, otherwise it contains a covariance matrix HFEpsilon(R1, ngroups, nsubj) Huynh and Feldt epsilon value for the data in range R1 where ngroups = the number of groups; if nsubj = 0 then R1 contains raw data, otherwise it contains a covariance matrix which is derived from raw data with nsubj subjects (corresponding to rows). GG_Epsilon(R1) Greenhouse and Geisser epsilon value for the data in range R1 which includes a column of labels HF_Epsilon(R1) Huynh and Feldt epsilon value for the data in range R1 which includes a column of labels DunnSidak(α, k) 1−(1−α)1/k ADJK(nrows, ncols) adjusted k value for Tukey HSD after two factor ANOVA FSTAR(R1) Brown-Forsythe’s test statistic F* on the data in range R1 DFSTAR(R1) df* for Brown-Forsythe’s test on the data in range R1 BFTEST(R1) p-value of the Brown-Forsythe’s test statistic on the data in range R1

There are also the following array functions about non-parametric ANOVA-like tests:

 WELCH_TEST(R1, lab) returns the column range: F, df1, df2, and p-value for Welch’s test for the data in range R1 FSTAR_TEST(R1, lab) returns the column range: F, df1, df2, and p-value for Brown-Forsythe’s test for the data in range R1 KW_TEST(R1, lab, ties) returns the column range: H, H-ties, df and p-value for Kruskal-Wallis test for the data in range R1, if ties = TRUE (default) a ties correction is applied

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted.

The following functions are used for 2^k factorial designs.

 DESIGN2k(k, lab, d) returns the design matrix for a 2^k factorial design ExpandDesign2k(R1, d) returns the design matrix for a 2^k factorial design augmented by R1 Effect2k(R1, R2, lab, d) returns a column array consisting of the effect size values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values SS2k(R1, R2, lab, d) returns a column array consisting of SS values for the design described by R1 and R2, where R1 contains the +1 and +1 values and R2 contains the data values

If lab = TRUE then the output includes a column of labels (although for DESIGN2k this is a row of labels), while if lab = FALSE (the default) only the data is outputted.  Output includes d-way interactions (d = 0, 2, 3; default = 2);

Non-parametric tests

 SignTest(R1, m, tails) p-value for the sign test where R1 contains the sample data and m = the hypothesized median RANK_COMBINED(x, R1, R2, order) rank of x in the combined range R1 and R2 taking ties into account; order is as in RANK_AVG RANK_SUM(R1, R2, order) sum of the ranks of all the elements in R1 based in the combined range R1 and R2 taking ties into account; order is as in RANK_AVG RANK_SUM(R1, k, order) sum of the ranks of all the elements in the kth column of R1 taking ties into account; order is as in RANK_AVG WILCOXON(R1, R2) minimum of W and W′ for the samples contained in ranges R1 and R2 WILCOXON(R1, n) minimum of W and W′ for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 WTEST(R1, R2, tails) p-value of the Wilcoxon rank-sum test for the samples contained in ranges R1 and R2 WTEST(R1, n, tails) p-value of the Wilcoxon rank-sum test for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 MANN(R1, R2) U for the samples contained in ranges R1 and R2 MANN(R1, n) U for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 MTEST(R1, R2, tails) p-value of the Mann-Whitney U test for the samples contained in ranges R1 and R2 MTEST(R1, n, tails) p-value of the Mann-Whitney U test for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 MANN_EXACT(R1, R2, tails, slow) p-value of the Mann-Whitney exact test for the samples contained in ranges R1 and R2; tails = 1 or 2 (default) SRank(R1, n) T for a single sample contained in range R1 minus n. If the second argument is omitted it defaults to zero. SRTEST(R1, n, tails) p-value for Signed-Rank test using the normal distribution approximation for the sample contained in range R1 minus n, If the second argument is omitted it defaults to zero. SRANK_EXACT(R1, med, tails, slow) p-value for Signed-Rank exact test for the sample in range R1 based on a hypothetical median med (default 0) SRankPair(R1, R2) T for a pair of samples contained in the one-column ranges R1 and R2, representing the paired sample SRankPair(R1) T for a pair of samples contained in range R1, where R1 consists of two columns, one for each paired sample SRTESTPair(R1, R2, tails) p-value for Signed-Rank test using the normal distribution approximation for the pair of samples contained in the one-column ranges R1 and R2, representing the paired sample SRANKPair_EXACT(R1, R2, tails, slow) p-value for Signed-Rank exact test for the samples in ranges R1 and R2 SRTESTPair(R1,,tails) p-value for Signed-Rank test using the normal distribution approximation for the pair of samples contained in range R1, where R1 consists of two columns, one for each paired sample KRUSKAL(R1) Kruskal-Wallis test statistic for the data in R1 KTEST(R1) p-value for Kruskal-Wallis Test for the data in R1 FRIEDMAN(R1) Friedman test statistic for the data in R1 FrTEST(R1) p-value for Friedman’s Test for the data in R1 MOODS_STAT(R1) chi-square statistic for Mood’s Median Test for the data in R1 MOODS_TEST(R1) p-value for Mood’s Median Test for the data in R1 COCHRAN(R1, raw, cont) Cochran’s Q statistic for the data in R1 QTEST(R1, raw, cont) p-value for Cochran’s Q Test for the data in R1 RLower_CRIT(n1, n2, α, tails) lower critical value for the runs test with n1 T’s and n2 F’s where  α is the significance level (default .05) and tails = 1 or 2 (default) RUpper_CRIT(n1, n2, α, tails) upper critical value for the runs test with n1 T’s and n2 F’s where  α is the significance level (default .05) and tails = 1 or 2 (default)

The argument tails = the # of tails = 1 (default) or 2, except for MANN_EXACT, SRANK_EXACT and SRANKPair_EXACT  where default = 2. These exact test functions are limited in terms of the size of the samples they can handle; the slow argument is used to increase these limits.

Two data input formats are supported for the COCHRAN, QTEST and QRATIO functions: raw data (raw = TRUE) and summarized data, i.e. a multi-variable frequency table (raw = FALSE, default). If cont = TRUE (default), then a continuity correction is used with two samples (equivalent to McNemar’s Test).

There are also the following array functions:

 MANN_TEST(R1, R2, lab, alpha, tails, ties, cont) returns the column range: U, alpha, tails, z, r, U-crit, p-value for the Mann-Whitney test for the data in ranges R1 and R2 MANN_CONF(R1, R2, lab, ttype, alpha) returns the column range: U-crit, alpha, lower, upper, median, U-crit+1, alpha, lower, upper for the Mann-Whitney test for the data in ranges R1 and R2. SRANK_TEST(R1, R2, lab, alpha, tails, ties, cont) returns the column range: T, alpha, tails, z, r, T-crit, p-value for the paired Wilcoxon signed ranks test for the data in ranges R1 and R2 SRANK_TEST(R1, hyp, lab, alpha, tails, ties, cont) returns the column range: T, alpha, tails, z, r, T-crit, p-value for the one sample signed ranks test for the data in range R1 and hypothetical median hyp (default = 0) SR_CONF(R1, R2, lab, ttype, alpha, nzero) returns the column range: T-crit, alpha, lower, upper, median, T-crit+1, alpha, lower, upper for the signed-ranks test for the data in ranges R1 and R2. If nzero = TRUE (default), then differences between values in R1 and R2 that are zero are eliminated. SR_CONF(R1, hyp, lab, ttype, alpha, nzero) returns the column range: T-crit, alpha, lower, upper, median, T-crit+1, alpha, lower, upper for the signed-ranks test for the data in ranges R1 a with hypothetical median med.  If nzero = TRUE (default), then differences between values in R1 and R2 that are zero are eliminated. RUNSTEST(s, lab, tails) returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approx.), p-value (exact test)  for the one-sample runs test on the data in string s RUNSTEST(R1, lab, tails) returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approximation), p-value (exact test)  for the one-sample runs test on the numeric data in range R1 RUNS2TEST(R1, R2, lab, iter) returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approximation), p-value (exact test)  for the two-sample runs test on the data in ranges R1 and R2 when iter = 0 (default); if iter > 0 the test is run iter times to deal with ties, one column of output for each unique value for the # of runs QRATIO(R1, raw) returns a row range with the percentages of each independent variables in Cochran’s Q test

If lab = TRUE then the output includes a column of labels, while if lab = FALSE (the default) only the data is outputted, tails = the # of tails: 1 or 2 (default), alpha is the significance level (default = .05), if ties = TRUE (default) then a correction for ties is applied. If cont = TRUE (default) then a continuity correction is applied. For MANN_TEST if COUNT(R1)+COUNT(R2) ≤ 26 and for SRANK if COUNT(R1) ≤ 26 then an exact test is also performed, in which case the critical value and p-value for this test is also output.

For small samples, the p-value of Wilcoxon’s Signed Ranks one-tail test for test statistic T and sample size n is given by PERMDIST(T, n, TRUE). The critical value is PERMINV(α, n). The two tail test is given by 2* PERMDIST(T, n, TRUE) and PERMINV(α/2, n). See below for PERMDIST and PERMINV.

For MANN_CONF, if ttype = 0 (default) then the normal approximation is used; if ttype = 1 then MCRIT and MPROB are used with harmonic interpolation; if ttype = 2 then MCRIT and MPROB are used with linear interpolation; and if ttype = 3 then MANNINV and MANNDIST are used. The situation is similar for SR_CONF.

Kolmogorov-Smirnov and Lilliefors functions

 KDIST(x, m) value of the Kolmogorov distribution function at x KINV(p, m) inverse of KDIST; i.e. KINV(p, m) = x where 1−KDIST(x, m) = p KSDIST(x, n) p-value of the one-sample Kolmogorov-Smirnov test at x for samples of size n KSINV(p, n) critical value of the one-sample Kolmogorov-Smirnov test at p for samples of size n LDIST(x, n) p-value of the Lilliefors test for normality at x for samples of size n LINV(p, n) critical value of the Lilliefors test at p for samples of size n LTEST(R1) D-max for the data in range R1 based on KS test or Lilliefors test KSDIST(x, n1, n2, b, m) p-value of the two-sample Kolmogorov-Smirnov test at x (i.e. D-stat) for samples of size n1 and n2 KSINV(p, n1, n2, b, iter, m) critical value of the two-sample Kolmogorov-Smirnov test at p for samples of size n1 and n2 KS2TEST(R1, R2, lab, α, b, iter, m) array function which outputs a column vector with the values D-stat, p-value, D-crit, n1, n2 from the two sample KS test for the samples in ranges R1 and R2, where α is the significance level (default = .05) and b, iter and m are as in KSINV.

If R2 is omitted (the default) then R1 is treated as a frequency table. If lab = TRUE then an extra column of labels is included in the output; thus the output is a 5 × 2 range instead of a 1 × 5 range if lab = FALSE (default). m = the # of iterations used in calculating an infinite sum (default = 10).

Other Goodness of Fit functions

 ANDERSON(R1, dist, k) the Anderson-Darling D statistic for the theoretical cdf values in range R1 (assumed to be sorted in ascending order) based on the distribution defined by dist. ADTEST(R1, dist, lab, iter, alpha) range containing the Anderson-Darling D statistic, the critical value and the estimated p-value for the Anderson-Darling test on the data in range R1 (not necessarily in sorted order) based on the distribution defined by dist. GOFTEST(R1, dist, lab, iter) range containing the parameters for the distribution specified by dist, the p-value of the chi-square goodness of fit test and the data value in R1 which has the lowest expected frequency value followed by this expected frequency value GOFTESTExact(R1, dist, lab, param1, param2) range containing param1 and param2, the p-value of the chi-square goodness of fit test for the distribution specified by dist and the parameter values and the data value in R1 which has the lowest expected frequency value followed by this expected frequency value. FIT_TEST(R1, R2, npar) p-value for the chi-square goodness of fit test where where R1 = the array of observed data, R2 = the array of expected values and npar  = the number of unknown parameters.

dist = 0 (default) for a generic distribution with no unknown parameters, dist = 1 for the normal distribution, dist = 2 for the exponential distribution, dist = 3 for the Weibull distribution and dist = 4 for the gamma distribution. For GOFTEST and GOFTESTExact, we also have dist = 5 for beta distribution and dist = 6 for uniform distribution.

k = the shape parameter (alpha) of the gamma distribution (default 1) and is ignored for the other distributions. alpha is the significance level (default .05). If h = TRUE then harmonic interpolation is used in the in calculating the critical value; otherwise linear interpolation is used. If lab = TRUE (default FALSE), then an extra column of labels is appended to the output from ADTEST.

For ADTEST, iter is the number of iterations used in the calculation of an approximate p-value (default 20). For GOFTEST, if iter = 0, then the method of moments is used to estimate the unknown parameters; if iter = -1 then the pure method of moments is used; if iter = -2, then the regression approach is used (this is only valid for the Weibull distribution) and if iter > 0 (default  20) then an iterative approach is used with iter many iterations (except that no iteration is actually performed for the normal or exponential distribution; instead the result is the same as iter = -1).

Interrater Reliability functions

 KAPPA(R1, k, lab, α, orig) array function which returns Cohen’s (Fleiss’) kappa for the data in R1 when k = 0 (default) and kappa for category k when k > 0, plus standard error, z-stat, p-value and lower and upper bound of 1−α confidence interval (α defaults to .05) WKAPPA(R1, R2, lab, α) array function which returns weighted kappa for the data in R1 using the weights in R2, plus standard error and lower and upper bound of 1−α confidence interval (α defaults to .05). ICC(R1, class, type, lab, α) array function which outputs the intraclass correlation coefficient ICC(class, type) plus the lower and upper bound of the 1−α confidence interval for the data in R1; default values are class = 2, type = 1, α = .05. KENDALLW(R1, lab, ties) array function which returns a column range consisting of Kendall’s W, r, χ2, df and p-value; if ties = TRUE then a ties correction is applied KALPHA(R1, weights, ratings) Krippendorff’s alpha for the agreement table in range R1 based on the weights and ratings in the second and third arguments. KTRANS(R1, col) agreement table that corresponds to the rating table in R1 when column col is removed; no column is removed if col = 0 (default) WKAPPA(R1, R2, lab, α) array function which returns weighted kappa for the data in R1 using the weights in R2, plus standard error and lower and upper bound of 1−α confidence interval (α defaults to .05). ICC(R1, class, type, lab, α) array function which outputs the intraclass correlation coefficient ICC(class, type) plus the lower and upper bound of the 1−α confidence interval for the data in R1; default values are class = 2, type = 1, α = .05. KENDALLW(R1, lab, ties) array function which returns a column range consisting of Kendall’s W, r, χ2, df and p-value; if ties = TRUE then a ties correction is applied KALPHA(R1, weights, ratings) Krippendorff’s alpha for the agreement table in range R1 based on the weights and ratings in the second and third arguments. KTRANS(R1, col) array with agreement table for Krippendorff’s alpha that corresponds to the rating table in R1 when column col is removed; no column is removed if col = 0 (default) KRIP_SES(R1, lab, weights, ratings, alpha, scorrection) column array that contains: Krippendorff’s alpha for the data in the agreement table in range R1, the corresponding standard error for subjects and the lower and upper ends of the 1 – alpha confidence interval for Krippendorff’s alpha. KRIP(R1, lab, weights, ratings, alpha, scorrection, rcorrection) column array that contains: Krippendorff’s alpha for the data in the rating table in range R1, the corresponding standard error for subjects and the lower and upper ends of the 1 – alpha confidence interval for Krippendorff’s alpha, followed by the total s.e., including subjects and raters) and the lower and upper ends of the 1 – alpha confidence interval corresponding to this s.e. KRIP_SER(R1, lab, weights, ratings, alpha, rcorrection) the standard error for raters, where range R1 contains data from a rating table for Krippendorff’s alpha. GWET_AC2(R1, weights, ratings) Gwet’s AC2 for the agreement table in range R1 based on the weights and ratings in the second and third arguments. GTRANS(R1, col) array with agreement table for Gwet’s AC2 that corresponds to the rating table in R1 when column col is removed; no column is removed if col = 0 (default) GWET_SES(R1, lab, weights, ratings, alpha, scorrection) column array that contains: Gwet’s AC2 for the data in the agreement table in range R1, the corresponding standard error for subjects and the lower and upper ends of the 1 – alpha confidence interval for Gwet’s AC2. GWET(R1, lab, weights, ratings, alpha, scorrection, rcorrection) column array that contains: Gwet’s AC2 for the data in the rating table in range R1, the corresponding standard error for subjects and the lower and upper ends of the 1 – alpha confidence interval for Gwet’s AC2, followed by the total s.e., including subjects and raters) and the lower and upper ends of the 1 – alpha confidence interval corresponding to this s.e. GWET_SER(R1, lab, weights, ratings, alpha, rcorrection) the standard error for raters, where range R1 contains data from a rating table for Gwet’s AC2.

If lab = TRUE, then an extra column of labels is appended to the output (default = FALSE).

If range R2 is omitted in WKAPPA it defaults to the unweighted measure where the weights on the main diagonal are all zeros and the other weights are ones. Range R2 can also be replaced by a number r. A value of r = 1 means the weights are linear (as in Figure 1), a value of 2 means the weights are quadratic. In general this means that the equivalent weights range would contain zeros on the main diagonal and values (|i−j|)r in the ith row and jth column when i ≠ j.

The weights argument is a square range containing the weights or the value 0 (default) if categorical weights are used, 1 if ordinal weights are used, 2 for interval weights and 3 for ratio weights.

The ratings argument is a row or column range containing the rating values. If omitted then the ratings 1, …, q are used where q is the size of the range.

alpha is the significance level (default .05). scorrection and rcorrection are the subject and rater correction factors (referred to as the f parameters).

Other Reliability functions

 SPLIT_HALF(R1, R2) split half coefficient  for the data in ranges R1 and R2 SPLITHALF(R1, type) split-half coefficient for the scores in the first half of the items in R1 vs. the second half of the items if type = 0 and the odd items in R1 vs. the even items if type = 1. GUTTMAN_SPLIT(R1, s) Guttman’s lambda for the data in range R1 based on the split described by string s consisting of 0’s and 1’s GUTTMAN(R1) the Guttman’s reliability measure for the data in range R1, i.e. the maximum Guttman’s lambda based on all possible splits; when the number of splits is too large, a second argument iter can be used to find an approximate maximum Guttman’s lambda based on a randomly generated iter number of splits. SB_SPLIT(R1, s) split half coefficient (after Spearman-Brown correction) for data in R1 based on the split described by string s consisting of 0’s and 1’s SB_CORRECTION(r, n, m) Spearman-Brown correction when the split-half correlation based on an m vs. n–m split is r. If n is omitted, then it is assumed that there is a 50-50 split. If n is present but m is omitted, then it is assumed that m = n/2. SB_PRED(m, rho, n) Spearman-Brown predicted reliability based on m items when Spearman-Brown for n items is rho. SB_SIZE(rho1, rho, n) the number of items necessary to bring the Spearman-Brown predicted reliability up (or down) to rho1 from n items with Spearman-Brown of rho. KUDER(R1) Kuder-Richarderson Formula 20 coefficient for the data in range R1 CRONALPHA(R1, k) Cronbach’s alpha for the data in range R1 if k = 0 and Cronbach’s alpha with kth item (i.e. column) removed if k > 0 CALPHA(R1) array function which returns a row of Cronbach’s alpha for R1 with each item removed ITEMDIFF(R1, mx) item difficulty index for the scores in R1 where mx is the maximum score for the item ITEMDISC(R1, R2, p, mx) item discrimination index where R1 contains the scores for each subject for a single item and R2 contains the corresponding scores for all items based on the top/bottom p% of total scores and mx is the maximum score for the item whose scores are contained in R1

Partitioning

 INIT_SPLIT(n, m) a string of length n consisting of m 0’s followed by n–m 1’s. If omitted m defaults to n/2. NEXT_SPLIT(s) the string representing the next split after the split represented by s. RAND_SPLIT(n, m) a random string of length n consisting of m 0’s and n–m 1’s. If omitted m defaults to n/2. COV_SPLIT(R1,  s) sample covariance for the data in range R1 based on the split described by string s. CORR_SPLIT(R1,  s) correlation for the data in range R1 based on the split described by string s. a string consisting of n 0’s the string representing the next partition after the partition represented by s. a random string of length n

These strings consist only of 0’s and 1’s.

Frequency table

 FREQTABLE(R1) array function which returns an n × 3 array which contains the frequency table for the data in range R1, where n = the number of unique values in R1 (i.e. the number of data elements in R1 without duplicates) FREQTABLE(R1, bsize, bmax) an array function which produces the frequency table for the data in range R1, assuming equally sized bins of size bsize; bmax = largest bin value, if omitted maximum value in R1 is used FREQ2RAW(R1, R2) array function which takes a frequency table described by R1 (data items) and R2 (frequencies)  and outputs the equivalent raw data (in the form of a column vector) FREQ2RAW(R1) array function which is equivalent to FREQ2RAW(R2, R3) where R2 is the first column in R1 and R3 is the second column in R1

Noncentral distribution functions

 NT_DIST (t, df, δ, cum, m) value of the noncentral t distribution T(df,δ) at t if cum = TRUE and pdf at t if cum = FALSE NCHISQ_DIST (x, df, λ, cum, m, prec) value of the noncentral chi-square distribution χ2(df,λ) at x if cum = TRUE and pdf at x if cum = FALSE NF_DIST(x, df1, df2, λ, cum, m, prec) value of the noncentral F distribution F(df1,df2,λ) at x if cum = TRUE and pdf at x if cum = FALSE
 NT_INV (p, df, δ, m, iter) inverse of the noncentral t distribution T(df,δ) at p NCHISQ_INV (p, df, λ, m, iter, prec) inverse of the noncentral chi-square distribution χ2(df,λ) at p NF_INV(p, df1, df2, λ, m, iter, prec) inverse of the noncentral F distribution F(df1,df2,λ) at p
 NT_NCP (p, df, x, m, iter) ncp value δ such that NT_DIST(x, df, δ, TRUE, m) = p NCHISQ_NCP (p, df, x, m, iter, prec) ncp value λ such that NCHISQ_DIST(x, df, λ, TRUE, m) = p NF_NCP(p, df1, df2, x, m, iter, prec) ncp value λ such that NF_DIST(x, df1, df2, λ, TRUE, m) = p

m = maximum number of terms in the infinite sum that will be calculated: 1 – 170 (default 120) for the t distribution functions and default 1,000 for χ2 and F distribution functions. prec = desired level of accuracy for the power calculation of the χ2 and F distribution functions (default 0.000000001). iter = the number of iterations used to calculate the inverse or noncentrality parameter (default 40).

Statistical power functions

 NORM1_POWER(d, n, tails, α) power of a one sample normal test when d = Cohen’s effect size NORM2_POWER(mm, s1, s2, n1, n2, tails, α) power of a two sample normal test when mm = |μ2−μ1| and s1 and s2 are the corresponding population standard deviations BINOM_POWER(p0, p1, n, tails, α) power of a one sample binomial test when p0 = probability of success on a single trial based on the null hypothesis, p1 = expected probability of success on a single trial CORREL1_POWER(r0, r1, n, tails, α) power of a one sample correlation test using the Fisher transformation when r0 = population correlation (based on the null-hypothesis), r1 = effect size (observed correlation) T1_POWER(d, n, tails, α, m) power of a one sample t test when d = Cohen’s effect size T2_POWER(d, n1, n2, tails, α, m) power of a two sample t test when d = Cohen’s effect size; if n2 is omitted or set to 0 then n2 is considered to be equal to n1 VAR1_POWER(ratio, n, tails, α) power of a one sample variance test where ratio = $s^2/\sigma_0^2$ (effect size) VAR2_POWER(ratio, n1, n2, tails, α) power of a two sample variance test where ratio = $\sigma_2^2/\sigma_1^2$ (effect size) CHISQ_POWER(w, n, df, α, m, prec) power of a chi-square goodness of fit or independence test where w = Cohen’s effect size and df = degrees of freedom ANOVA1_POWER(f, n, k, type, α, m, prec) power of a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), f = RMSSE effect size if type = 2 and f = noncentrality parameter if type = 0 REG_POWER(e, n, k, type, α, m, prec) power of multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default), e = R2 effect size if type = 2 and e = noncentrality parameter if type = 0 CALPHA_POWER(ca0, ca1, n, k, tails, α) power of Cronbach’s alpha test where ca0 = Cronbach’s alpha under the null hypothesis, ca1 = Cronbach’s alpha under the alternative hypothesis and k = # of items ICC_POWER(ρ0, ρ1, n, k, α) power of ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items

n, n1, n2 = the sample size, tails = # of tails: 1 or 2 (default), α = alpha (default = .05) and m and prec as for the noncentral distribution functions.

Sample size functions

 NORM1_SIZE(d, 1−β, tails, α) minimum sample size required to obtain power of at least 1−β in a one sample normal test when d = Cohen’s effect size NORM2_SIZE(mm, s1, s2, 1−β, tails, α, nratio) minimum sample size required to obtain power of at least 1−β in a two sample normal test when mm = |μ2−μ1| and s1 and s2 are the corresponding population standard deviations BINOM_SIZE(p0, p1, 1−β, tails, α) minimum sample size required to obtain power of at least 1−β in a one sample binomial test when p0 = probability of success on a single trial based on the null hypothesis, p1 = expected probability of success on a single trial CORREL1_SIZE(r0, r1, 1−β, tails, α) minimum sample size required to obtain power of at least 1−β in a a one sample correlation test using the Fisher transformation when r0 = population correlation (based on the null-hypothesis), r1 = effect size (observed correlation) T1_SIZE(d, 1−β, tails, α, m) minimum sample size required to obtain power of at least 1−β in a one sample t test when d = Cohen’s effect size T2_SIZE(d, 1−β, tails, α, nratio, m) minimum sample size required to obtain power of at least 1−β in a two sample t test when d = Cohen’s effect size VAR1_SIZE(ratio, 1−β, tails, α) minimum sample size required to obtain power of at least 1−β in a one sample variance test where ratio = $s^2/\sigma_0^2$ (effect size) VAR2_SIZE(ratio, 1−β, tails, α, nratio) minimum sample size required to obtain power of at least 1−β in a two sample variance test where ratio =$\sigma_2^2/\sigma_1^2$ (effect size) CHISQ_SIZE(w, df, 1−β, α, m, prec) minimum sample size required to obtain power of at least 1−β in a chi-square goodness of fit or independence test where w = Cohen’s effect size and df = degrees of freedom ANOVA1_SIZE(f, k, 1−β, type, α, m, prec) minimum sample size required to obtain power of at least 1−β for a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default) and f = RMSSE effect size if type = 2 REG_SIZE(e, k, 1−β, type, α, m, prec) minimum sample size required to obtain power of at least 1−β for multiple regression where k = # of predictors, e = Cohen’s effect size f2 if type = 1 (default) and e = R2 effect size if type = 2 CALPHA_SIZE(ca0, ca1, k, 1−β, tail, α) minimum sample size required to obtain power of at least 1−β for Cronbach’s alpha test where ca0 = Cronbach’s alpha under the null hypothesis, ca1 = Cronbach’s alpha under the alternative hypothesis and k = # of items ICC_SIZE(ρ0, ρ1, k, 1−β, α) minimum sample size required to obtain power of at least 1−β for ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items

tails = # of tails: 1 or 2 (default), α = alpha (default = .05), m and prec as for the noncentral distribution functions, nratio = the size of the second sample divided by the size of the first sample (default = 1) or negative of a specific value for the size of the second sample; default for 1−β = .80.

Other distribution functions

 MULTINOMDIST(R1, R2) value of multinomial distribution where R1 contains the number of successes and R2 contains the corresponding probabilities of success UNIFORM_DIST(x, α, β, cum) pdf value at x for uniform distribution with end points α and β when cum = FALSE and corresponding cdf value when cum = TRUE UNIFORM_INV(p, α, β) inverse at p of the uniform distribution with end points α and β POISSON_INV(p, λ) inverse at p of the Poisson distribution with mean λ WEIBULL_INV(p, α, β) inverse at p of the Weibull distribution with parameters α and β NEGBINOM_INV(p, k, pp) inverse at p of the negative binomial distribution with parameters k and pp HYPGEOM_INV(p, n, k, m) inverse at p of the hypergeometric distribution with parameters n, k and m BETA(x, y) value of the beta function at x and y QDIST(q, k, df) studentized q cumulative distribution value for q with k independent variables and df degrees of freedom QINV(p, k, df, tails) inverse of the studentized q distribution, i.e. the critical value for the studentized q range; tails = 1 or 2 (default) PERMDIST(x,n, cum, slow) pdf of permutation distribution with n elements at x if cum = FALSE and the cdf value if cum = TRUE PERMINV(p, n, slow) inverse of the permutation distribution at p; i.e. the least value of x such that PERMDIST(x, n, TRUE) ≥ p PERM2DIST(x,n1, n2, cum, slow) pdf of two sample permutation distribution with n1 and n2 elements at x if cum = FALSE and the cdf value if cum = TRUE PERM2INV(p, n1, n2, slow) inverse of the two sample permutation distribution at p; i.e. the least value of x such that PERMDIST(x, n1, n2, TRUE) ≥ p MANNDIST(x,n1, n2, tails, slow) p-value of the Mann-Whitney exact distribution with n1, n2 elements at x, where tails = 1 (default) or 2 MANNINV(p, n1, n2, tails, slow) inverse of the Mann-Whitney exact distribution at p; i.e. the least value of x such that MANNDIST(x, n1, n2, TRUE) ≥ p RUNSDIST(r, n1, n2, cum) probability of getting r runs from a string with n1 T’s and n2 F’s if cum = FALSE (i.e. the pdf at r) and the probability of getting at most r runs from a string of n1 T’s and n2 F’s if cum = TRUE (i.e. the cdf at r) RUNSINV(p, n1, n2) inverse at p of the runs distribution with n1 T’s and n2 F’s F_DIST(x, df1, df2, cum) equivalent to F.DIST, except that df1 and df2 can be non-integer F_INV(p, df1, df2) inverse at p of the F distribution based on F_DIST IGAMMA_DIST(x, alpha, beta, cum) inverse gamma distribution at x IGAMMA_INV(p, alpha, beta) inverse of the inverse gamma distribution at p ICHISQ_DIST(x, df, cum) inverse chi-square distribution at x ICHISQ_INV(p, df) inverse at p of the inverse chi-square cdf DIRICHLET_DIST(pvector, avector) Dirichlet distribution at pvector (whose values add up to 1) based on alpha parameters in avector DIRICHLET_RAND(avector) an array pvector consisting of random values for the Dirichlet distribution with alpha values in avector

F_DIST and F_INV are like Excel’s F.DIST and F.INV functions except that df1 and df2 don’t have to be integers. In addition, the following functions are available which have a similar advantage over their Excel counterpart: F_DIST_RT and F_INV.RT.

A default limit of n1 + n2 = 28 (sum of the two sample sizes) has been set  for Perm2Dist, Perm2Inv, MannDist and MannInv, and a limit of  n = 25 has been set for  PermDist and PermInv. The slow argument is used to increase these limits.

Distribution fitting functions

 BETA_FITM(R1, lab, pure) an array consisting of the estimated beta distribution alpha and beta values based on method of moments, plus MLE. GAMMA_FITM(R1, lab, pure) an array consisting of the estimated gamma distribution alpha and beta values based on the method of moments, plus MLE. UNIFORM_FITM(R1, lab, pure) an array consisting of the estimated uniform distribution alpha and beta values based on the method of moments, plus MLE. WEIBULL_FITM(R1, lab, pure, iter, bguess) an array consisting of the estimated uniform distribution alpha and beta values based on the method of moments, plus MLE. BETA_FIT(R1, lab, iter, aguess, bguess) an array consisting of the estimated beta distribution alpha and beta values based on MLE approach, plus actual and estimated mean and variance, and MLE. GAMMA_FIT(R1, lab, iter, aguess) an array consisting of the estimated gamma distribution alpha and beta values based on MLE approach, plus actual and estimated mean and variance, and MLE. UNIFORM_FIT(R1, lab, iter) an array consisting of the estimated uniform distribution alpha and beta values based on MLE approach, plus actual and estimated mean and variance, and MLE. WEIBULL_FIT(R1, lab, iter, bguess) an array consisting of the estimated Weibull distribution alpha and beta values based on MLE approach, plus actual and estimated mean and variance, and MLE. WEIBULL_FITR(R1, lab) an array consisting of the estimated Weibull distribution alpha and beta values based on regression, plus actual and estimated mean and variance, and MLE and R-square. POLYGAMMA(z, k) digamma function at z if k = 0 (default) and trigamma function at z if k = 1.

If lab = TRUE, then an extra column of labels is appended to the output (default is FALSE). If pure = TRUE, then the pure method of moments is used (default is FALSE). MLE is the likelihood estimate for the parameters that are output.

iter is the number iterations used in calculating the solution (default 20); for UNIFORM_FIT if iter = 0 then alpha and beta are set to the minimum and maximum data values (biased MLE). aguess is the initial guess for alpha (use 0 for default initial guess) and bguess is the initial guess for beta (use 0 for default initial guess).

Other functions

 FText(R1) formula in cell R1 as text VER() current version of the Real Statistics Resource Pack (e.g. 5.2 Excel 2013/2016) ExcelVer() Excel version (e.g. Excel 2010 is 14.0) ColLabel(n) label of the nth worksheet column (e.g. ColLabel(50) = “AX”) FirstBlank(R1) address (e.g. “B7″) of the first empty cell in range R1. If R1 contains no empty cells then the value “None” is returned FirstNonNum(R1) address of the first empty or non-numeric cell in range R1. If R1 contains no non-numeric cells then the value “None” is returned. RANDOM(a, b, type) non-volatile random number between a and b; if type = TRUE (default) then a decimal number is returned, if type =FALSE an integer is returned

Matrix functions

 DET(R1) same as MDETERM(R1). DIAG(R1) array function that returns a column vector with the values on the diagonal of the matrix in range R1 (esp. useful when R1 is a square matrix) DIAGONAL(R1) array function that returns a square matrix whose main diagonal consists of the elements in range R1 and has zeros elsewhere IDENTITY(k) array function that returns a k × k identity matrix; if k is omitted then the size of the highlighted range is used instead ISCELL(R1) TRUE if R1 is a single cell and FALSE otherwise ISSQUARE(R1) TRUE if R1 is a square range and FALSE otherwise LENGTH(R1) length of matrix in range R1 = the square root of the sum of the squares of all the elements in R1 (esp. useful for column or row vectors) NORM(R1) array function that returns the normalized version of the matrix in range R1 MERGE(R1, R2) array function which merges ranges R1 and R2, i.e. it returns a range containing the values in R2 to the right of those in R1 TRACE(R1) trace of matrix in range R1 SSCP(R1) equivalent to MMULT(TRANSPOSE(R1),R1) MPOWER(R1, k) array function that returns the square matrix in R1 multiplied by itself k times

Simultaneous linear equations

 ELIM(R1) array function which outputs the results of Gaussian Elimination on the augmented matrix found in the array R1. The shape of the output is the same as the shape of R1 LINEQU(R1) array function which returns a column vector with solution to linear equations defined by R1; returns an error if no solution or the solution is not unique

By default, each of these functions assumes that an entry with absolute value less than 0.0001 is equivalent to zero. This value can be changed by using the optional prec argument in ELIM(R1, prec) and LINEQU(R1, prec).

Table lookup

Here R1 defines a table, including both the data and row/column headings.

 INTERPOLATE(r, r1, r2, v1, v2, h) the value between v1 and v2 that are proportional to the distance that r is between r1 and r2, where v1 corresponds to r1 and v2 corresponds to r2 based on linear interpolation where h = FALSE, while if h = TRUE (default) a harmonic interpolation is used. MLookup(R1, r, c) the value in the table defined by range R1 in the row headed by r and the column headed by c. ILookup(R1, r, c, h) the value in the table defined by range R1 corresponding to row r and column c. If r or c can refer to some value that must be interpolated between row or column headings (using the INTERPOLATE function), provided those headings are numbers. If the first row (or column) heading is preceded by “>” it refers to values smaller than the next row (or column heading). If the last row (or column) heading is preceded by “>” it refers to values bigger than the previous row (or column heading).

There are also the following lookup functions for specific tables. In the following n, n1, n2, k and df are generally positive integers, α is a decimal between 0 and 1 (default .05) and tails takes the value 1 or 2 (default). If α exceeds the largest value for alpha in the table or is smaller than the smallest value for alpha in the table then #N/A is returned. For values between those in the associated table, if h = TRUE (default) a harmonic interpolation is used, while if h = FALSE a linear interpolation is used.

 TauCRIT(n, α, tails, h) critical value in the Kendall’s Tau table RhoCRIT(n, α, tails, h) critical value in the Spearman’s Rho table SRankCRIT(n, α, tails, h) critical value in the Wilcoxon Signed Rank table KSCRIT(n, α, tails, h) critical value in the One Sample Kolmogorov-Smirnov table LCRIT(n, α, tails, h) critical value in the Lilliefors table WCRIT(n1, n2, α, tails) critical value in the Wilcoxon Rank-Sum table MCRIT(n1, n2, α, tails, h) critical value in the Mann-Whitney table QCRIT(k, df, α, tails, h) critical value in the Studentized Range Q table DCRIT(k, df, α, tails, h) critical value in the Dunnett’s test table DLowerCRIT(n, k, α, h) lower critical value in the Durbin-Watson Table DUpperCRIT(n, k, α, h) upper critical value in the Durbin-Watson Table ADCRIT(n, α, dist, k, h) critical value in the Anderson-Darling Tables where dist and k are as for ADTEST. KS2CRIT(n, α, tails, h) critical value in the Two Sample Kolmogorov-Smirnov table

Related to the above critical values determined by one of the statistics tables are the following estimates of p-values based on a linear interpolation, if necessary, of the values in these statistics tables. The parameter iter = the # of iterations used to arrive at the p-value. If the p-value exceeds the largest value for alpha in the table then a value of 1 is returned; if the p-value is less than the smallest value for alpha in the table then 0 is returned. If h = TRUE (default), then harmonic interpolation is used; otherwise linear interpolation is used.

 WPROB(W, n1, n2, tails, iter, h) p-value for the Wilcoxon Rank-Sum test MPROB(U, n1, n2, tails, iter, h) p-value for the Mann-Whitney test SRankPROB(T, n, tails, iter, h) p-value for the Wilcoxon Signed Rank test KSPROB(x, n, tails, iter, h) p-value for the One Sample Kolmogorov-Smirnov test LPROB(x, n, tails, iter, h) p-value for the Lilliefors test SWPROB(n, W, b, h) p-value for the Shapiro-Wilk test for a sample of size n and statistic W; if b = TRUE then use the Royston algorithm is used, while if b = FALSE then a table look-up (table 2) is used ADPROB(x, dist, k, iter, h) p-value for the Anderson-Darling test where dist and k are as in ADCRIT or ADTEST. KS2PROB(x, n1, n2, iter, h) p-value for the Two Sample Kolmogorov-Smirnov test

Sorting and eliminating duplicates

 QSORT(R1, b) array function which fills highlighted range with data from R1 in sorted order (by columns); b is an optional parameter (default = TRUE); if b is TRUE then sort is in ascending order and if b is FALSE (or 0) sort is in descending order. QSORTRows(R1, col, ascend, head) array function which fills highlighted range with rows from R1 in sorted order based on the elements in the colth column of R1; if ascend is TRUE (default) then sort is in ascending order and if ascend is FALSE (or 0) sort is in descending order; if head = TRUE (default is FALSE) then first row is presumed to contain column headings and is not sorted QSORT2Rows(R1, col1, col2, ascend, head) array function which fills highlighted range with rows from R1 in sorted order using two sort keys col1 and col2; ascend and head are as for QSORTRows. QSORT2RowsMixed(R1, col1, col2, ascend) array function which sorts R1 (containing only numeric data) as for QSORT2Rows, except that if ascend is TRUE (default) then primary sort on col1 is in ascending order and secondary sort on col2 is in descending order, while if ascend is FALSE then primary sort is descending and secondary sort is ascending SortUnique(R1, s) array function which fills highlighted range with unique elements from R1 in ascending sorted order; output is a single column ExtractUnique(R1, s) array function which fills highlighted range with unique elements from R1 in the order they appear first in R1; output is a single column NODUPES(R1, s, b) array function which fills highlighted range with data from R1 eliminating any duplicates (by columns); if b = TRUE then the data in R1 is sorted first (by column) and if b = FALSE then it is assumed that range R1 is in sorted order. SortRowsUnique(R1, s) array function which fills highlighted range with unique rows from R1 in ascending sorted order; output has same number of columns as R1. SortRowsCount(R1, s) array function which fills highlighted range with unique rows from R1 in ascending sorted order plus a count of the number of occurrence of that row in R1; output has one more column than R1. SortRowsSum(R1, s) array function which fills highlighted range with unique rows from R1 excluding last column  in ascending sorted order plus a sum of the values in the last column of R1 for matching occurrences; output has same number of columns as R1.

The string s is used as a filler in case the output range has more rows/cells than needed. This argument is optional and defaults to the error value #N/A.

Sampling and reshaping

 REVERSE(R1, s) array function which fills highlighted range with data from R1 in reverse order (by columns) REV(R1) array function which reverses the elements in R1; resulting range should have the same size and shape as R1. RESHAPE(R1, s) array function which fills highlighted range with data from R1 (by columns) RESHAPE(R1, s, nrows, ncols) array function which fills an nrows × ncols range with data from R1 (by columns) SHUFFLE(R1, s) array function which fills highlighted range with a permutation of the data from R1 (sampling without replacement) RANDOMIZE(R1, s) array function which fills highlighted range with a random selection of data from R1 (sampling with replacement) DELBLANK(R1, s) array function which fills highlighted range with data from R1 (by columns) omitting any empty cells DELNonNum(R1, s) array function which fills highlighted range with data from R1 (by columns) omitting any non-numeric cells DELROWBLANK(R1, b, s) array function which fills highlighted range with data from R1 (by columns) omitting any row with one or more empty cells; if b is TRUE then the first row of R1 (presumably containing column headings) is always copied (even if it contains a blank cell); the second argument is optional defaults to b = FALSE. DELROWNonNum(R1, b, s) array function which fills highlighted range with data from R1 (by columns) omitting any row with one or more non-numeric cells; if b is TRUE then the first row of R1 (presumably containing column headings) is always copied (even if it contains a non-numeric cell); the second argument is optional and defaults to b = FALSE. DELErr(R1) array function which fill highlighted range with data from R1 except that any cells with an error value are replaced by the empty value “” SUBMATRIX(R1) array function that outputs an nrows × ncols subrange of R1 (nrows defaults to m – row1 + 1 and ncols defaults to n – col1 + 1) starting at the cell in the row1th row (default 1) and col1th column (default 1) of the m × n range R1. SelectCols(R1, ss, c) array function which fill highlighted range with the columns from R1 defined by the string ss, where ss contains a list of column numbers of the columns to be selected from R1; if c > 0 then also sort the rows in the output based on the elements of the cth column in the output

The string s is used as a filler in case the output range has more cells/rows than needed. This argument is optional and defaults to the error value #N/A.

Categorical Coding

 CATCODE(R1) array function which fills highlighted array with simple coding of values in range R1; if R1 is an m × n range, highlight an m × n range TAGCODE (R1, b) array function which fills highlighted array with dummy coding of values in column range R1; if R1 has m rows and k unique values then highlight an m × (k–1) range; if b = TRUE (default) normal dummy coding is used, if b = FALSE alternative dummy coding is used.

Iterative Proportional Fitting Procedure

 IPFP2(R1) array function which fills highlighted array with the output from the IPF procedure on the two-contingency table with targeted marginal totals in R1; if R1 is an m × n range, highlight an m-1 × n-1 range IPFP3 (R1, R2) array function which fills highlighted array with the output from the IPF procedure on the three-contingency table in R1 with targets in R2. If R1 is an m × n range, highlight an m × n range.

### 65 Responses to Real Statistics Functions

1. HT16 says:

Mr Charles, I (successfully) installed your *.xlam, but I would like to ask you how we may assess accuracy of functions, like KSDIST(x, n), and also accuracy of table at location:
https://i0.wp.com/www.real-statistics.com/wp-content/uploads/2012/11/one-sample-ks-table.png? I found on internet a “paper” written by Professor George Marsaglia and two Professors from the University of Hong Kong (Wai Wan Tsang and Jingbo Wang), where they, beyond alforithm, gave a C program for calculating this function, so I “copy/pasted” it, just a little reformatted it, and added parallel “my” functions (where I changed “double” into “long double”), and some really mine (long double) functions, like square root, exponential and logarithm sunctions, and changed their normalisation constant from 1e140 into 2^4096, so both that constant and its’ reciprocal value are representable in binary floating point variables. If you want examples you may contact me at my eMail…

• Charles says:

The functions are evaluated in double precision (compatible with Excel).
Charles

• HT16 says:

Thanks for fast reply, but do values KSDIST(303485/1000000;25) and KSDIST(303495/1000000;25) have to straddle 0.02, because in table … for 2% and 25-sample test is value 0.30349? And there are more values where table … and functions from *.xlam “disagree”…

• Charles says:

Both the table values and the Real Statistics function values (i.e the *.xlam values) are estimates and will likely differ a little for almost all entries. Statistics is, by definition, not an exact science. You are always dealing with estimates, probabilities and interpretations.
These small differences should not matter much.
Charles

2. Heather says:

I’ve reinstalled the most recent version of the Resource Package that you have on your site (and this program is amazing btw!). However, I still can’t find many of the functions, such as CRONALPHA or RELIABILITY. It seems like I followed the required steps given I have many of the other functions, such as CHISQ.DIST. Do you have any troubleshooting tips? Thanks!

3. Antonio says:

Hi,

I would like to call Real Statistics functions from Visual Basics to build my own program. Nevertheless, I am finding problems with it. Using the following code it does not work.

Dim MyRange As Range
Set MyRange = Range(Cells(2, 1), Cells(Ensayos + 1, 1))
Shap = Application.WorksheetFunction.Shapiro(MyRange)

Thank you.

Best regards,

Antonio.

4. Des says:

Hi Charles, how do I do normality test for residuals in Excel, when I use Shapiro-Wilk test in SPSS? Thanks so much!

5. A.Habeeb says:

Hi Charles,

Thanks for lots of learning materials. I have requirement to find a positive or negative trend exists or not. So I wanted to do a Mann Kendal Trend test . I request your help how to do this in excel. I have some information in nist.gov I have done that but the table has values more than 30 which is not in the site. I read some articles where it can be done without Mann’s table. Kindly share the method how it can be done in excel or R.
Thanking you

Best regards,

Habeeb

• Charles says:

Habeeb,
Yes, with this big a sample, you can use the normal distribution approximation which doesn’t require the tables. See the following:
Mann-Whitney Test
Charles

6. Tury says:

Hi!

I tried to use the Functions. I can use some like LogiCoeff but I can’t use DRegResiduals and all that about Deming. It don’t recognise the Functions. Can you help me?

Thanks

• Charles says:

Tury,
What do you see when you enter the formula =VER()
Charles

• Tury says:

4.9 Excel 2010/2013/2016

I tried with Ecxel-Translator in Italian language but there are not traslation for DRegResiduals.

Tury

• Charles says:

Tury,
This is a Real Statistics function and is not available in standard Excel. You can use this function when you download the Real Statistics software, which is available for free.
Charles

• Tury says:

Hi,
I have installed Real Statistics many time ago, I can open the window “Real Statistics” with ctr+m.

• Charles says:

Tury,
If you are not finding DRegResiduals in the version of Real Statistics, then perhaps you need to download a newer version.
Charles

7. Chris says:

Dear sir,

Please first allow me to congratulate you on this REALLY NEEDED add-in.

I think that I have a simple question, but after 1 week on your website (and having downloaded the add-in), I still cannot find the solution, so I’m obliged to write to you:

I have 40 men and 60 women, to whom I asked “do you like Internet?”. The possible replies were “not at all”, “a bit”, “somehow”, “a lot” and “I can’t live without!” (I consider this Likert scales as equal distant and, of course, it is just an example with random numbers). I got the following results: Men [2, 6, 15, 10, 7], meaning that men tend to feel “medium” and Women [1, 3, 15, 25, 16], meaning that women seem more positive.

I want to prove (or not) that Men react differently than Women (at a=0.05). My Statistics professor insists that the result should be p=0.014 (by “linear-by-linear” chi-square), but I think that Pearson (p=0.108) is more correct (at least based on what you mention in your website).

b) If my professor is correct, how do I do it with your add on and
c) Which group likes internet more?

Thank you very much in advance, people like you tend to be rare nowdays…

• Charles says:

Chris,

The chi-square test for independence is used with categorical data. The data that you have is ordered and so chi-square isn’t really the right test. Also you have too many cells with fewer than 5 elements.

Better to use a Likert scale with “not at all”, “a bit”, “somehow”, “a lot” and “I can’t live without!” becoming 1,2,3,4,5 (or something similar). You can then use the Real Statistics Frequency data analysis tool to create the raw data for each sample (frequency table to raw data conversion).

Next you need to conduct a t test or better yet a Mann-Whitney test since the data may not be normally distributed. These are provided by the Real Statistics software. Once you do this, you should get a p-value similar to that given by your professor.

Charles

• Chris says:

It works!!! Thank you from the bottom of my heart! If you are ever in Greece for vacations, you’ll always have a place to stay (you can find my e-mail, if need be) !

• Charles says:

Chris,
Good to hear.
Charles

8. John Metcalf says:

I installed your pack into Excel. I found the Sample Size location, but all I want to find the sample size of, let’s say, 1,000 items for an attribute, such as “does documentation support the amount of an expense?” I am not sure what to put in the dialog box. I want 90% confidence, ±5% precision, expected rate 3% (A guess up front). What would I enter in the parameters to get a sample size?

• Charles says:

John,
I can’t tell from “does documentation support the amount of an expense?”, which statistical test you are using. I need to know this to answer your question.
Charles

• John Metcalf says:

I just want to get the sample size needed for any given “universe” and I am looking for the easiest way to do that for a simple attribute sample. The attribute is a simple yes or no. As you can see, I am no expert at statistics, so I have to keep it as simple as possible – plug in the parameters and get a number.

• Charles says:

John,
I understand that you want to keep it simple, but I still need to know what test you are conducting.
Charles

• John Metcalf says:

I don’t understand what you mean by what test I am doing. I am just looking at a sample and determining if the staff processed it right – yes or no.

• Charles says:

John,
In this case, I don’t understand your question.
Charles

9. Justin M says:

When I try to use the logitcoeff functions on my table of data, I am just getting a blank answer, any idea what I might be doing wrong? Or could I send you a copy of my sheet for you to take a look? Thanks!

• Charles says:

Yes, you can send me a copy.
Charles

• Justin M says:

Thanks, but it turns out I was just being bad with Excel and array formulas. Your sample Excel file with all of the formulas showed me what I was doing wrong (just took several tries to figure it out).

As a side note, just wanted to say thanks for putting all of this together and being very responsive, it’s all extremely helpful!

10. Casper says:

Hi,

I downloaded the Real Statistics pack for Excel 2007, and it does not contain Logistics Regression. May I know if there are any changes made to the pack?

Casper

• Charles says:

Casper,
Logistic Regression is now an option of the Regression data analysis tool.
Charles

11. Rye says:

Charles,

Would like to ask if the Real Statistics volume has an initial count on a stress tester, please forgive me as i am still a newbie to the Excel programme 🙂

Would be really helpful if it was installed as well

Thank You 🙂

• Charles says:

No, if you are referring to stress testing of banks or investments.
Charles

12. Bismark says:

Great work done Charles. this is the first time i’m having something useful for my Mac excel. your package is going to help me a lot.

will be counting on you for more

• Charles says:

Great to hear.
Charles

13. António Teixeira says:

The site does not accept the characters for different : what is intended is

shw different from zero where you find shw 0

• Charles says:

shw <> 0

14. António Teixeira says:

Hi,
Congratulations for your superb job that made me feel both surprised and gratified for having my name in the references section.
I found that, like me, you document some sheets with the formulas used in adjacent cells. I don’t know which method you use. I consider that using the following function is very useful. In models you can toggle between showing and hiding the result of the function.

Function GetFormula(cell As Range, Optional shw As Integer) As String

If IsMissing(shw) Then
shw = 0
End If

If shw 0 Then
GetFormula = cell.Formula
Else
GetFormula = cell.Formula
End If

End Function

Regards
António Teixeira

• Charles says:

Hi António,

Thank you for your kind words. I also appreciated your paper, which helped me when I was researching how to calculate statistical power.

Thanks also for your suggestion regarding documenting formulas. I am using a more primitive version of the GetFormula formula, which is called FTEXT (i.e. formula text), but I may try to modify it along the lines you are suggesting.

Charles

15. Veronica says:

Thanks for this, it is really amazing and helpful. Cheers!

16. Rben says:

Hi Charles,
Thank you for sharing this great resource. I have installed Real Statistics and using the tool successfully. Some reason, I couldn’t use Real Statistics Supplemental Functions. For example, I am trying to calculate AIC using”LogitRSquare(R1, lab, raw, iter)”, but it gives nothing or sometimes “LL”.

When I click on the cell where you calculated AIC in the “Sheet: Logit 6a” in the example work book that I download, it shows “=’http://www.real-statistics.com/Users/C/AppData/Roaming/Microsoft/AddIns/RealStats.xlam’!LogitRSquare(A4:D15,TRUE) “. I thought, it should show my local directory where I installed. Is this a problem?

If you have any information to use Real Statistics Supplemental Functions, that would be great.
Thanks,Rben

• Charles says:

Rben,

The reason why you are only getting a strange result when you use LogitRSquare(R1, lab, raw, iter) is that this is an array function. To use this function, you need to highlight a range of cells (a 7 x 2 range for this function if lab is set to TRUE) and then press Ctrl-Shft-Enter (instead of just Enter). See the webpage http://www.real-statistics.com/excel-environment/array-formulas-functions/ for more details about array functions in Excel.

When using a spreadsheet that someone else has prepared which refers to an Excel add-in, you need to tell Excel where that add-in is on your computer. Fortunately you only need to do this once. This is the reason why you see the string =’http://www.real-statistics.com/Users/C/AppData/Roaming/Microsoft/AddIns/RealStats.xlam’!LogitRSquare(A4:D15,TRUE).

Charles

17. Sam says:

Charles,

Thank you very much for making this tool available. After weeks of search, I found yours to get eigenvectors of square symmetric matrix.
My matrix is quiet large ( 200 by 200). I am using Excel 2010 with 2.60Ghz processor and 12GB Ram. When I run the eVectors command, it is taking over 20 minutes to execute and display results.
Is there limitations in the code for size or otherwise? Is there anything you are aware of that might cause slowness in the calculations?

Thank you

• Charles says:

Sam,
The code that I use is not optimized for performance. There are various techniques for doing this that I have not yet implemented. I will be re-evaluating this procedure shortly. I would like to modify the code so that it works with non-symmetric matrices. I will also see if I can improve the performance.
Charles

18. Julien says:

Hi Charles,

I tried the Wilcoxon rank-sum test with Excel (the function WTEST), but the parameter “t” doesn’t work. If I just write “=WTEST(R1;R2)” it’s ok, but if I try to add the parameter “t” (1 or 2) at the end it doesn’t work. It says “#VALUE!”. Do you know why?

Thanks,
Julien

• Charles says:

Julien,
I just retested the WTEST function and it works fine when the t parameter is added. What version of Excel are you using? Is it for English language or some other language?
Charles

• Julien says:

Hi Charles,

I use Microsoft Excel for Mac 2011 in English

Here is my command: =WTEST(L2:L35;M2:M35;2)

Any idea?

19. kirk barrett says:

thanks for the wonderful resource!

putting the list of functions in as images make it
impossible to search. i was searching for find something
about spearman’s rank correlation test and was
coming up empty. i was going to give up
until i noticed the images.
it would be very helpful to change them to text so
they can be searched.

• Charles says:

Kirk,
Originally I had intended these lists to be in the form of tables, but they didn’t display very well, and so I replaced them by images, but I agree with you that this makes it hard to search. Over the next few days I will be modifying the webpages to support a new release of the software, which should be out later today. I will also modify the list of functions as part of this process.
Charles

20. charan says:

COV() function returns just a value, how do I get the whole matrix as the result ?

• Charles says:

Charan,
COV() is an array function, and so you need to highlight the appropriate range and then press Ctrl-Shift-Enter (i.e. while holding down the Control and Shift keys, press the Enter key). E.g. suppose your data in in the range A1:C10. Since this represents 3 variables (3 columns), you need a 3 x 3 matrix to hold the output. You could therefore highlight the range E1:G3, enter the formula =COV(A1:C10) and press Ctrl-Shift-Enter. The whole matrix will now be visible.
Charles

• charan says:

Thanks Charles,

I have figured it out from other posts, selecting the Matrix Operations after clicking Ctlr+M. Have just started using other functionalities too, especially PCA. They look promising, keep up the good work…

21. This is great work. Many thanks, I look forward to sharing this with my students.

22. Frank says:

The second question is whether you have any separate tutorial/explanatory material that you have prepared for the Examples worksheets? It is difficult to follow without some explanation of what the origin of the data sets are. Perhaps I am just not seeing this on the website.

• Charles says:

Frank,
Everything in the Examples worksheets is on the website somewhere. The first two worksheets (TOC0 and TOC) try to link the Examples worksheets to the website. I will try to see if I can improve on this in the future.
Charles

23. Frank says:

This is a wonderful software resource! The 2 questions I have are (1) If you could specify what the specific formula is for the non-parametric p values that are generated using the supplemental MTEST or WTEST commands? The p values do not match what I obtain using other software – appearing about half as great (i.e., more significant). I’m trying to understand why.

• Charles says:

Frank,
The MTEST and WTEST functions return the one-tail version of the p-value. You need to double these values to get the two-tail version of the p-value. I have just revised these functions today so that you can specify whether you want the one-tail or two-tail test. To use this capability you need to download the latest release of the software (Release 2.14.1).
Charles

24. Manfred says:

I managed to make the vba Tools (like Regression) work, however I cannot use the supplemental functions like slopestest.
=RealStats.xlam!SlopesTest(A5:A19;B5:B19;D5:D20;E5:E20;FALSCH)
(German Excel 2013, Win 8.1)
RealStats.xlam is shown in the VBA code
=ver() Shows 2.8.1
but slopestest does not show up.
best regards
Manfred

• Charles says:

Manfred,
SlopesTest was introduced in Rel 2.10. If VER() works then probably all the other functions work too. You just need to download the latest release to get SlopesTest. I have tested that release with the Italian version of Excel and it works; let’s hope that the German version works as well.
Charles

25. hilman says:

Hi,

I’ve already downloaded RealStats.xlam, I can also activate the statistical package. However I do know how to operate the Supplemental Functions. It seems unavailable in the Function list neither it cannot work if I type the function in any cells. Someone please give me a clue.

Thanks

• Charles says:

Hi Hilman,

Once you have downloaded and installed RealStats.xlam you should be able to type any of the functions in the statistical package in a cell (or range) in an Excel worksheet. You will know whether you have successfully installed the software by typing the formula =VER() into any cell. You should see the release value 2.8.1 (or something similar). If you don’t see this you need to review the instructions on how to install the software at http://www.real-statistics.com/free-download/real-statistics-resource-pack/.

The supplemental functions operate exactly as the standard Excel functions, except that when you enter a supplemental function in a cell you won’t be prompted for its arguments. To get an idea of what the arguments are you can type = followed by the name of the function and then Ctrl-Shift-A. You can also see a list of the supplemental functions in Excel’s Function List under the user defined functions category.

A clearer list of the supplemental functions is available on the website — e.g. http://www.real-statistics.com/excel-capabilities/supplemental-functions/ and http://www.real-statistics.com/excel-capabilities/real-statistics-multivariate-functions/.

Note that to use the data analysis tools in RealStats.xlam you need to press Ctrl-m.

Charles

26. Ramkumar says:

Hi,
I use a lot of your functions in excel.
However, I find that while your two factor ANOVA supplemental function in excel gives the correct F value, the p-value, calculated using the ATestInt function is inconsistent with results from other statistical packages. Can you please check into this?

regards
Ram

• Charles says:

Hi Ramkumar,

Thanks for bringing this issue to my attention. I have checked the software and found that the ATESTInt function just returns FDIST(F, dfInt, dfW). If the value of F is correct then the only places where the formula can give the wrong answer are if dfInt or dfW is incorrect or Excel’s FDIST function gives the wrong answer. These don’t seem so likely.

Can you check whether the value of the supplemental function dfInt(R1, r) or dfWF(R1, r) gives an answer different from the one obtained from other statistical packages? Alternatively can you send me a specific example where you obtained a value from ATESTInt(R1,r) which was different from other statistical packages? In this way I can check where the problem is.

Charles