Although all the statistical analyses described in this website can be done with standard Excel capabilities (see Built-in Excel Functions and Built-in Statistical Functions), it is often easier to use the supplemental functions and data analysis tools provided in the Real Statistics Resource Pack. You use the supplemental functions exactly as for the standard Excel functions (see Worksheet Functions).

We now briefly describe the various Real Statistics supplemental functions. More complete descriptions of these functions are provided elsewhere in this website.

Also note that if you enter =*function-name* into any cell and press **Ctrl-Shift-A** the arguments for the named function will be displayed. E.g. =LogitCoeff + **Ctrl-Shift-A** will display

=LogitCoeff(rg,lab,raw,head,alpha,iter)

See Real Statistics Multivariate Functions for a summary of the supplemental multivariate statistics functions available in the Real Statistics Resource Pack.

See Real Statistics Time Series Analysis Functions for a summary of the supplemental multivariate statistics functions available in the Real Statistics Resource Pack.

See Real Statistics Data Analysis Tools for a list of the supplemental statistical data analysis tools provided by the Real Statistics Resource Pack.

**Basic descriptive statistics**

MEAN(R1) |
same as AVERAGE(R1) |

IQR(R1, b) |
interquartile range of data in range R1; if b = TRUE then use QUARTILE while if b = FALSE then use QUARTILE.EXC (Excel 2010/2011/2013 only) |

MAD(R1) |
median absolute deviation of the data in range R1 |

RNG(R1) |
range of data in range R1, i.e. MAX(R1) – MIN(R1) |

SKEWP(R1) |
population skewness for the data in R1 |

KURTP(R1, excess) |
population kurtosis for the data in R1; if excess = TRUE (default) then 3 is subtracted |

SUMCOL(R1) |
array function which returns a row range with sums of each column in R1 |

MEANCOL(R1) |
array function which returns a row range with means of each column in R1 |

VARCOL(R1) |
array function which returns a row range with sample variances of each column in R1 |

STDEVCOL(R1) |
array function which returns a row range with sample standard deviations of each column in R1 |

SUMROW(R1) |
array function which returns a column range with sums of each row in R1 |

MEANROW(R1) |
array function which returns a column range with means of each row in R1 |

VARROW(R1) |
array function which returns a column range with sample variances of each row in R1 |

STDEVROW(R1) |
array function which returns a column range with sample standard deviations of each row in R1 |

STDERR(R1) |
standard error of the data in range R1, i.e. STDEV(R1)/SQRT(COUNT(R1)) |

RANK_AVG(x, R1, order, num_digits) |
rank of x in range R1 taking ties into account; order = 0 (default) indicates decreasing order and a non-zero value indicates increasing order; equivalent to Excel’s RANK.AVG(x, R1, order) except that the data is first rounded off to num_digit decimal places. |

Ties_Correction(R1, R2, type) |
ties correction value for the data in range R1 and optionally range R2, where type = 0: one sample, type = 1: paired sample, type = 2: independent samples |

TRIMDATA(R1, p) |
array function which returns a column range equivalent to R1 after removing the lowest and highest p/2 % of the data values. |

WINSORIZE(R1, p) |
array function which returns a column range that is the Winsorized version of R1 removing the lowest and highest p/2 % of the data values. |

WINMEAN(R1, p) |
Winsorized mean of the data in range R1 removing the lowest and highest p/2 % of the data values. |

STANDARD(x, R1, type, exc) |
=STANDARD(x, AVERAGE(R1), STDEV(R1) if type = 0 (default); =(x−Q3)/IPR if type = 1 and x > Q3; =(x−Q1)/IPR if type = 1 and x < Q1; = 0 otherwise. IPR = IPR(R1,exc), Q1 = QUARTILE(R1,1) and Q3 = QUARTILE(R1,3) if exc = FALSE (default) and Q1 = QUARTILE.EXC(R1,1) and Q3 = QUARTILE.EXC(R1,3) if exc = FALSE |

**Count measurements**

COUNTU(R1) |
count of the number of unique numeric values in R1 |

COUNTAU(R1) |
count of the number of unique non-empty cells in R1 |

COUNTCOL(R1) |
array function which returns a row range with the number of numeric values in each column of R1 |

COUNTROW(R1) |
array function which returns a column range with the number of numeric values in each row of R1 |

CountRowsUnique(R1, head, ncols) |
count of the number of unique rows in R1. If head = TRUE (default FALSE) then first row of R1 (presumably a heading) is not counted; the last ncols columns of R1 are not considered when determining uniqueness (ncols = 0). |

CountFullRows(R1, b) |
number of rows in range R1 which don’t have any empty cells if b = TRUE (default) and the number of rows in R1 which don’t have any non-numeric cells if b = FALSE |

CountPairs(R1, R2, b) |
number of pairs of cells from R1 and R2 for which neither pair is empty if b = TRUE (default) and number of pairs of cells from R1 and R2 for which neither pair is non-numeric if b = FALSE |

**T distribution**

CONFIDENCE_T(alpha, s, size) |
the value k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value, standard deviation s and sample size, assuming a t distribution |

T_CONF(R1, alpha) |
the value k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution |

T_LOWER(R1, alpha) |
the value x̄ − k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution |

T_UPPER(R1, alpha) |
the value x̄ + k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a t distribution |

DF_POOLED(R1, R2) |
degrees of freedom for the two sample t test for samples in ranges R1 and R2, especially when the two samples have unequal variances |

STDERR_POOLED(R1, R2, b) |
pooled standard error for two sample t test for samples in ranges R1 and R2; if b = TRUE assume equal variances, while if b = FALSE assume unequal variances |

STDEV_POOLED(R1, R2) |
pooled standard deviation for the two sample t test for samples in ranges R1 and R2 when the two samples have equal variances |

VAR_POOLED(R1, R2) |
pooled variance for the two sample t test for samples in ranges R1 and R2 when the two samples have equal variances |

T_DIST(x, df, cum) |
value of the cumulative distribution function of the t distribution with df degrees of freedom at x when cum = TRUE, and value of the pdf of the distribution with df degrees of freedom at x when cum = FALSE |

There are also the following array formulas for identifying outliers:

GRUBBS(R1, lab, alpha) |
outputs a 4 × 1 column range with the following entries: one potential outlier, G-stat, G-crit and test significance |

ESD(R1, lab, alpha) |
outputs a 4 × k range where k = max # of potential outliers and each column contains: a potential outlier, G-stat, G-crit and test significance (based on ESD test) |

OUTLIERS(R1, alpha) |
outputs a column range with up to k outliers (based on the ESD test) |

If *lab* = TRUE (default = FALSE) then an extra column is added with labels

**Normal distribution**

NORM_CONF(R1, alpha) |
the value k such that x̄ + k is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution |

NORM_LOWER(R1, alpha) |
the value x̄ − k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution |

NORM_UPPER(R1, alpha) |
the value x̄ + k such that (x̄ − k, x̄ + k) is the confidence interval for the population mean based on the stated alpha value and sample data in range R1 assuming a normal distribution |

SHAPIRO(R1, b) |
the Shapiro-Wilk test statistic W for the data in the range R1 |

SWTEST(R1, b, h) |
the p-value of the Shapiro-Wilk test on the data in range R1 |

SWCoeff(n, k, b) |
the kth Shapiro-Wilk coefficient for samples of size n |

SWCoeff(R1, C1, b) |
the Shapiro-Wilk coefficient for cell C1 within sorted sample in range R1 |

JARQUE(R1, pop) |
the Jarque-Barre statistic for the data in range R1 |

JBTEST(R1, pop) |
the p-value of the Jarque-Barre test on the data in range R1 |

SKEWTEST(R1, lab, alpha) |
array function to test whether the skewness of the data in range R1 is zero (consistent with a normal distribution); output contains sample skewness, standard error, test statistic, p-value and 1–alpha confidence interval |

KURTTEST(R1, lab, alpha) |
array function to test whether the kurtosis of the data in range R1 is zero (consistent with a normal distribution); output contains sample kurtosis, standard error, test statistic, p-value and 1–alpha confidence interval |

DAGOSTINO(R1) |
the D’Agostino-Pearson statistic for the data in range R1 |

DPTEST(R1) |
the p-value of the D’Agostino-Pearson test on the data in range R1 |

In the above functions, when *b* = TRUE (default) the Royston algorithm is used and when *b* = FALSE the original Shapiro-Wilk algorithm is used (employing harmonic interpolation unless *h* = FALSE when linear interpolation is used).

**Chi-square distribution**

CHI_STAT2(R1, R2) |
Pearson’s chi-square statistic for observation values in range R1 and expectation values in range R2 |

CHI_MAX2(R1, R2) |
maximum likelihood chi-square statistic for observation values in range R1 and expectation values in range R2 |

CHI_STAT(R1) |
Pearson’s chi-square statistic for observation values in range R1 |

CHI_MAX(R1) |
maximum likelihood chi-square statistic for observation values in range R1 |

CHI_TEST(R1) |
p-value for Pearson’s chi-square statistic for observation values in range R1 |

CHI_MAX_TEST(R1) |
p-value for maximum likelihood chi-square statistic for observation values in range R1 |

CHISQ_DIST(x, df, cum) |
value of the cdf of the chi-square distribution with df degrees of freedom at x when cum = TRUE, and value of the pdf of the chi-square distribution with df degrees of freedom at x when cum = FALSE (df does not need to be an integer) |

CHISQ_INV(p, df) |
inverse of the chi-square distribution with df degrees of freedom at p (df does not need to be an integer) |

FISHERTEST(R1, tails) |
probability calculated by the Fisher exact test for the 2 × 2 contingency table in range R1 where tails = the number of tails: 1 (one-tail) or 2 (two tail) |

**Covariance and Correlation**

COVARP(R1, R2) |
population covariance of the populations defined by ranges R1 and R2; equivalent to COVAR(R1, R2) |

COVARS(R1, R2) |
sample covariance of the samples defined by ranges R1 and R2; equivalent to COVARIANCE.S(R1, R2) |

CORREL_ADJ(R1, R2) |
estimated population correlation coefficient corresponding to the sample data in ranges R1 and R2 |

MCORREL(R, R1, R2) |
multiple correlation of dependent variable z with x and y where the samples for z, x and y are the ranges R, R1 and R2 respectively |

PCORREL(R1, i, j) |
partial correlation coefficient between the ith and jth columns in R1 |

SCORREL(R1, R2, lab, tails) |
array function which returns Spearman’s rank correlation (rho), t-stat and p-value for the data in R1 and R2 |

KCORREL(R1, R2, lab, tails, α, ties) |
array function which returns Kendall’s tau, standard error, z-stat, z-crit, p-value, lower and upper bounds of 1–α confidence interval of Kendall’s tau for the data in R1 and R2 |

COV(R1, b) |
array function which returns the sample covariance matrix for the data in R1 |

COVP(R1, b) |
array function which returns the population covariance matrix for the data in range R1 |

CORR(R1, b) |
array function which returns the correlation matrix for the data in range R1 |

PCORR(R1) |
array function which returns the partial correlation matrix for the data in range R1 |

For CORR, COV and COVP if *b* = TRUE (default) then any row in R1 which contains a non-numeric cell is not used, while if *b* = FALSE then correlation/covariance coefficients are calculated pairwise (by columns) and so any row which contains non-numeric data for either column in the pair is not used to calculate that coefficient value.

For KCORREL, if* lab* = TRUE then the output takes the form of a 7 × 2 range with the first column consisting of labels, while if* lab* = FALSE (default) the output takes the form of a 7 × 1 range without labels. If *ties* = TRUE (default) then a ties correction is applied.

Similarly SCORREL returns a 3 × 2 range if *lab* = TRUE and a 3 × 2 range if *lab* = FALSE.

The following are functions for correlation coefficient hypothesis testing:

CorrTTest(ρ, r, n, tails) |
p-value for one sample t test of the correlation coefficient ρ = 0 |

CorrTLower(r, n, α) |
lower bound of 1–α confidence interval of the population correlation coefficient |

CorrTUpper(r, n, α) |
upper bound of 1–α confidence interval of the population correlation coefficient |

CorrelTTest(r, n, α, lab, tails) |
array function which returns the test statistic z, p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient based on a hypothetical correlation ρ = 0 |

CorrelTTest(R1, R2, α, lab, tails) |
CorrelTest(r, n, ρ, α, lab, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) |

CorrTest(ρ, r, n, tails) |
p-value for one sample test of the correlation coefficient |

CorrLower(r, n, α) |
lower bound of 1–α confidence interval of the population correlation coefficient |

CorrUpper(r, n, α) |
upper bound of 1–α confidence interval of the population correlation coefficient |

CorrelTest(r, n, ρ, α, lab, tails) |
array function which returns the test statistic z, p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient |

CorrelTest(R1, R2, ρ, α, lab, tails) |
CorrelTest(r, n, ρ, α, lab, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) |

CorrETest(r, n, ρ, tails) |
p-value for one sample exact test of the correlation coefficient ρ |

CorrETest(R1, R2, ρ, tails) |
CorrETest(r, n, ρ, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) |

CorrelETest(r, n, ρ, lab, α), tails |
array function which returns the test statistic p-value and lower and upper bounds of 1–α confidence interval of the population correlation coefficient based on hypothetical correlation ρ |

CorrelETest(R1, R2, ρ, lab, α, tails) |
CorrelETest(r, n, ρ, lab, α, tails) where r = CORREL(R1, R2) and n = CountPairs(R1,R2) |

Correl2Test(r1, n1, r2, n2, α, lab) |
array function which returns the test statistic z, p-value (two-tailed test) and lower and upper bounds of 1–α confidence interval of the difference of population correlation coefficients ρ1– ρ2, sample correlations r1 and r2, and sample sizes n1 and n2 |

Correl2Test(R1, R2, R3, R4, α, lab) |
Correl2Test(r1, n1, r2, n2, α, lab) where r1 = CORREL(R1, R2), r2 = CORREL(R3, R4), n1 = CountPairs(R1,R2) and n2 = CountPairs(R3,R4) |

PCRIT(n, α, tails) |
critical value for the Pearson’s correlation coefficient based on two samples of size n with alpha of α (default .05) and tails = 1 or 2 (default) |

For CorrelTTest, CorrelTest and Correl2Test if* lab* = TRUE then the output takes the form of a 4 × 2 range with the first column consisting of labels, while if* lab* = FALSE (default) the output takes the form of a 4 × 1 range without labels.

The CorrTest, CorrLower, CorrUpper, CorrelTest and Correl2Test functions are based a normal test using a Fisher transformation. The CorrTTest, CorrTLower, CorrTUpper and CorrelTTest functions are based on a t-test where the population correlation is assumed to be zero.

*r* = sample correlation, *ρ *= hypothetical population correlation*, n* = sample size and* tails* = the # of tails: 1 or 2 (default).

**Box-Cox Transformation**

BOXCOX(R1, λ) |
outputs a range containing the Box-Cox transformation of the data in range R1 using the given λ value. If λ is omitted, then the transformation which best normalizes the data in R1 is used based on maximizing the log-likelihood function. |

BOXCOXLL(R1, λ) |
log likelihood function of the Box-Cox transformation of the data in R1 using the given lambda value |

BOXCOXLambda(R1) |
value of lambda which maximizes the log likelihood function of the Box-Cox transformation of the data in R1 |

**Linear Regression**

The following are ordinary, non-array functions where R1 contains the X data and R2 contains the Y data:

A second R2 parameter can be used with each of the *df* functions above, although this parameter is not used. Similarly you can use SSRegTot(R1, R2) and its value will be equivalent to SSRegTot(R2). All these functions can optionally take a third argument *con*, where *con* = TRUE (default) means that the regression model takes a constant term and *con* = FALSE means that the regression model doesn’t have a constant term.

There is also a second form of the RSquare function in which **RSquare**(R1, *k*) =* R*^{2} where the X data consist of all the columns in R1 except the *k*th column and the Y data consist of the *k*th column of R1.

The following are array functions where R1 contains the X data and R2 contains the Y data.

DESIGN(R1) |
design matrix for the data in R1 |

HAT(R1, con) |
hat matrix for the data in R1 |

DIAGHAT(R1, con) |
diagonal of the hat matrix for the data in R1 |

CORE(R1) |
core of the hat matrix for the data in R1 |

LEVERAGE(R1, con) |
leverage vector = diagonal of hat matrix for the data in R1 |

RegCov(R1, R2) |
covariance matrix for the regression coefficients of the regression line |

RegCoeff(R1, R2, con) |
two column range with the regression coefficients for the regression line in the first column and the corresponding standard errors in the second column |

RegPred(R0, R1, R2, lab, alpha) |
7 × 1 column range containing the predicted y value for the data in R0, the standard error for the confidence interval, the lower and upper ends of the 1–alpha confidence interval, the standard error for the prediction interval, the lower and upper ends of the 1–alpha prediction interval (alpha defaults to .05) |

RRegCoeff(R1, R2, hc, con) |
two column range with the regression coefficients for the regression line in the first column and the corresponding robust standard errors in the second column, where hc = 0 through 4 corresponding to HC0 through HC4 |

WRegCoeff(R1, R2, R3) |
two column range with the regression coefficients for the weighted regression line in the first column and the corresponding standard errors in the second column, where R3 contains the weights |

RegCoeffSE(R1, R2) |
vector with the standard errors of the coefficients for the regression line |

RegY(R1, R2, con) |
vector of predicted values for Y based on the regression line = TREND(R2,R1) |

RegE(R1, R2, con) |
vector of residuals based on the regression line |

RegStudE(R1, R2) |
vector of studentized residuals based on the regression line |

SHAPLEY(R1, R2) |
vector with the Shapley-Owen decomposition of R^{2} |

SlopesTest(R1, R2, R3, R4, b, lab) |
vector containing s.e. of differences between slopes, t, df and p-value where R3 and R4 are the X and Y values for a second regression line; if b = TRUE (default) the pooled s.e. is used; if lab = TRUE then a column of labels is added to the output (default = FALSE) |

The following are non-array functions where R1 contains the X data and R2 contains the Y data.

RegPredC(R0, Rc) |
predicted y values for x values in range R0 based on the regression coefficients in range Rc. R0 and Rc can be column or rows ranges. |

RSquareTest(R1, R3, R2) |
p-value of the test of the significance of X data in R3 (reduced model) vs. X data in R1 (full model) |

RegAIC(R1, R2, con) |
Akaike’s Information Criterion (AIC) for the regression model |

RegAICc(R1, R2, con) |
corrected AICc for the regression model |

RegSBC(R1, R2, con) |
Schwarz Baysean Criterion (SBC) for the regression model |

TOLERANCE(R1, j) |
Tolerance of the jth variable for the data in range R1 |

VIF(R1, j) |
VIF of the jth variable for the data in range R1 |

DURBIN(R1) |
Durbin-Watson statistic d where R1 is a column vector containing residuals |

DURBIN(R1, R2) |
Durbin-Watson statistic d where R1 contains X data and R2 contains Y data |

The following array functions are also supported:

DURBIN(R1, lab, alpha) |
Outputs the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 is a column vector containing residuals |

DURBIN(R1, R2, lab, alpha) |
Outputs the Durbin-Watson statistic d, the lower and upper bounds of the 1 − alpha confidence interval and the test significance, where R1 contains X data and R2 contains Y data |

If *lab* = FALSE (default) then the output is a 4 × 1 column range, while if *lab* = TRUE then the output is a 4 × 2 range with an extra column of labels.

The following array functions are used to create a stepwise regression model. R1 is an *n × k* array containing *x* data values, R2 is an *n ×* 1 array containing y data values and R3 is a 1 *× k* array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise.

RegRank(R1, R2, R3) |
returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by R1, R2, R3 |

RegCoeffP(R1, R2, R3) |
returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by R1, R2, R3 |

RegStepwise(R1, R2) |
returns a 1 × k array R where each non-blank elements in R corresponds to an x variable that should be retained in the stepwise regression model. Actually the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model. |

**Exponential Regression**

The following are array functions that support a nonlinear exponential regression model:

ExpCoeff(R1, R2, iter, lab) |
outputs a 2 × 4 range whose first column contains the coefficients α and β for the regression, whose 2nd column contains the corresponding standard errors for these coefficients, whose 3rd column contains SSE and MSE and whose 4th column contains and MSReg IfdfT. = TRUE then and extra row is added with labels (default = FALSE). lab |

ExpPred(R, R1, R2, iter) |
outputs an m × 1 column range with the values predicted by the exponential model for R1 and R2 based on the data in the m × 1 column vector of x values. |

ExpPredC(R, α, β) |
outputs an m × 1 column range with the values predicted by the exponential model with coefficients α and β based on the data in the m × 1 column vector R of x values. |

Here *iter* = the number of iterations (default 20). The last two functions can also be used as non-array functions of the following form:

ExpPred(x, R1, R2, iter) |
value predicted by the exponential model for x based on the data in R1 and R2 |

ExpPredC(x, α, β) |
value predicted by an exponential model with coefficients α and β for x |

**Polynomial Regression**

The following are functions that support polynomial regression. The first two functions are array functions. R1 and R2 are column arrays containing *x* and y data values respectively and *deg* is the degree/order of the polynomial.

PolyDesign(R1, deg, ones) |
returns an array consisting of x, x^{2}, …, x^{deg} columns. If ones = TRUE, then the output is 1, x, x^{2}, …, x^{deg} |

PolyCoeff(R1, R2, deg) |
returns a two column array consisting of the polynomial regression coefficients and their standard errors |

PolyRSquare(R1, R2, deg) |
R-square value for the polynomial regression |

PolyDegree(R1, R2, deg) |
the highest degree polynomial ≤ deg which produces a significantly different R-square value |

**Least Absolute Deviation (LAD) Regression**

The following are array functions which support LAD regression. R1 is an *n × k* array containing *x* data values, R2 is an *n ×* 1 array containing y data values, *con* takes the value TRUE if the regression includes a constant term and *iter* is the number of iterations used in the iteratively reweighted least squares algorithm (default = 25).

LADRegCoeff(R1, R2, con, iter) |
column array containing the LAD regression coefficients, k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE |

LADRegWeights(R1, R2, con, iter) |
n × 1 column array consisting of the weights calculated from the iteratively reweighted least squares algorithm |

LADRegCoeffSE(R1, R2, con, iter, nboots) |
column array consisting of the standard errors of the LAD regression coefficients based on bootstrapping nboots times; k+1 × 1 array if con = TRUE and k × 1 array if con = FALSE |

**Deming Regression**

The following are functions which support Deming regression, all but the last function are array functions. R1 is an array containing *x* data values, R2 is an array containing y data values and λ is the lambda value (in which case R1 and R2 contain one column) or omitted (in which case lambda is calculated from R1 and R2). If *lab* = TRUE (default is FALSE) then the output contains an extra column of labels, except for DRegResiduals where* lab* = TRUE means that the output contains an extra row of labels.

DRegCoeff(R1, R2, λ, lab) |
2 × 2 array containing the Deming regression intercept and slope coefficients along with the standard errors of these coefficients |

DRegResiduals(R1, R2, λ, lab) |
n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 and R2 where n = the number of rows in R1 (or R2). |

DRegIdentity(R1, R2, λ, lab) |
2 × 1 array consisting of x̄–ȳ and se(x̄–ȳ) for Deming regression |

DRegPred(xo, R1, R2, λ, alpha, lab) |
4 × 1 array consisting of the predicted value of y for x0, the standard error of the prediction and the confidence interval for this prediction based on Deming regression |

DRegLambda(R1, R2) |
lambda value for Deming regression calculated from R1 and R2. |

**Total Least Squares (TLS) Regression**

The following are array functions which support TLS regression.

TRegCoeff0(R1, R2, lab) |
2 × 1 column array consisting of the intercept and slope TLS regression coefficients; if lab = TRUE then an extra column of labels is appended (default FALSE) |

TRegCoeff(R1, R2, iter) |
k+1 × 1 column array consisting of the TLS regression coefficients, where k = # of columns in R1 and iter = # of iterations in SVD algorithm |

**Logistic Regression**

LogitCoeff(R1, lab, raw, head, alpha, iter) |
array function that returns logistic regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |

LogitCoeff2(R1, R2, lab, head, alpha, iter) |
array function like LogitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis. |

LogitCoeffs(R1, iter) |
array function like LogitCoeff that outputs the coefficients plus the number of iterations made |

LogitTest(R1, lab, raw, iter) |
array function that returns LL, LL0, chi-square and p-value for data in R1 |

LogitRSquare(R1, lab, raw, iter) |
array function that returns LL, LL0, pseudo R-square, AIC, BIC for data in R1 |

LogitPred(R0, R1, raw, iter) |
outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression model calculated from the data in R1 (without headings) |

LogitPredC(R0, R2) |
outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the logistic regression coefficients contained in R2 (in the form of a column vector) |

LogitSummary(R1, head) |
array function that returns a summary of the raw data in range R1. |

LogitMatches(R1, R2) |
array function which outputs the total number of successes and failures for each row in R2 based on the raw data in R1. |

LogitSelect(R1, s, head) |
array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output. |

LogitReduce(R1, s) |
array function that takes summary data in R1 including column headings and fills the highlighted range with the data in R1 omitting the columns described by the string s, where s is a comma delimited list of column headings in R1 |

HOSMER(R1, lab, raw, iter) |
array function that returns a Hosmer-Lemeshow table with 10 equal-sized data ranges based on the data in R1 (w/o headings) |

HLTEST(R1, lab, raw, iter) |
array function that returns the Hosmer-Lemeshow statistic and p-value |

If *lab* = TRUE then the output includes headings and if *lab* = FALSE (the default) only the data is outputted. Except as indicated above, if *head *= TRUE (default) then R1 and the output include column headings, while if *head *= FALSE then R1 and the output contain only data.

If *raw* = TRUE then the data in R1 is in raw form and if *raw* = FALSE (the default) then the data in R1 is in summary form. The parameter *alpha* is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter *iter* determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of *head* is FALSE.

**Probit Regression**

ProbitCoeff(R1, lab, raw, head, alpha, iter) |
array function that returns probit regression coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |

ProbitCoeff2(R1, R2, lab, head, alpha, iter) |
array function like ProbitCoeff, except that R1 contains the data for the independent variables and R2 contains the data for the dependent variable. If R2 has one column the data is in raw format, while if it has two columns the data is in summary format. R1 and R2 can contain non-numeric data; such data is ignored in the analysis. |

ProbitCoeffs(R1, iter) |
array function like ProbitCoeff that outputs the coefficients plus the number of iterations made |

ProbitTest(R1, lab, raw, iter) |
array function that returns LL, LL0, chi-square and p-value for data in R1 |

ProbtRSquare(R1, lab, raw, iter) |
array function that returns LL, LL0, pseudo R-square, AIC, BIC for data in R1 |

ProbitPred(R0, R1, raw, iter) |
outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression model calculated from the data in R1 (without headings) |

ProbitPredC(R0, R2) |
outputs the probability of success for the values of each row of independent variables contained in the range R0 based on the probit regression coefficients contained in R2 (in the form of a column vector) |

The arguments* lab, raw, iter, head *and* alpha* are as for the corresponding logistic regression function.

**Multinomial and Ordinal Logistic Regression**

The following are array functions where R1 is the data used to create the multinomial logistic regression model. When *r* = 0 (default) then the data in R1 is in raw form, whereas if *r* ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, *r*.

MLogitCoeff(R1, r, lab, head, iter) |
returns the coefficients for data in range R1. |

MLogitParam(R1, r, lab, head, alpha, iter) |
returns the coefficients and other parameters (s.e., Wald, confidence intervals, etc.) for data in range R1. If head = TRUE, then R1 contains column headings. |

MLogitTest(R1, r, lab, iter) |
returns LL, LL0, chi-square and p-value for data in R1 |

MLogitRSquare(R1, r, lab, iter) |
returns LL, LL0, pseudo R-square, AIC, BIC for data in R1 |

MLogitPred(R0, R1,r, iter) |
returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression model calculated from the data in R1 (without headings) |

MLogitPredC(R0, R2) |
returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2 |

MLogitSummary(R1, r, head) |
returns a summary of the raw data in range R1. |

MLogitSelect(R1, s, head) |
array function which takes the summary data in range R1 and outputs an array in summary form based on s. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output. |

MLogitExtract(R1, r, s, head) |
fills the highlighted range with the columns defined by string s from the data from R1. The string s takes the form of a comma delimited list of numbers 0, …, r. |

MLogit_Accuracy(R1, r, lab, head, iter) |
fills the highlighted range with a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. If R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE). |

Here *lab*, *head*, *alpha* and *iter* are as for the logistic regression functions described above. The following array function is used with ordinal logistic regression models:

OLogitPredC(R0, R2) |
returns a row vector with the probabilities of the outcomes of the dependent variable for the values of the independent variables contained in the range R0 (row or column vector) based on the logistic regression coefficients in R2 |

**Survival Analysis**

LOGRANK(R1, R1, lab) |
array function which returns the following statistics along with their p-value: log-rank 1, log-rank 2, Wilcoxon, Tarone-Ware |

COXEST(R1, approx, iter) |
array function which returns Cox regression coefficients, their standard errors, convergence values, LL1 and LL0 values and covariance matrix |

COXPRED(R1, R2, R0, lab, approx, iter, alpha) |
array function which predicts the hazard ratio between the two subject profiles in R1 and R2 (plus standard error and 1−alpha confidence interval) based on a Cox regression model derived from the input data in R0. |

If* lab* = TRUE then the output includes a column of labels, while if *lab* = FALSE (the default) only the data is outputted. The *approx* parameter takes the value: 0 if the continuum approximation is used and 1 (default) if the Breslow approximation is used. The output for COXEST and COXPRED is calculated using Newton’s Method with *iter* iterations (default = 20).

**ANOVA and related functions**

The following functions are used for one factor ANOVA with replication where R1 = the input data in Excel format

Here *b* is an optional argument. When *b* = TRUE (default) then the columns denote the groups, while when *b* = FALSE, the rows denote the groups.

If R1 is in standard (i.e. stacked) format, then you can use the following formulas, where the *col*^{th} column contains the data for the one-way ANOVA (default *col* = 2):

**SSWStd**(R1, *col*) = *SS _{W }*

**SSBetStd**(R1,

*col*) =

*SS*

_{Bet }**SSTotStd**(R1,

*col*) =

*SS*

_{Tot}The following functions are used for two factor ANOVA where R1 = the input data in Excel format and *r* = the number of rows in R1 that make up an A factor level.

The second argument for the column and interaction terms is optional and can be dropped. Note that the column and total terms are identical to the between groups and total terms, respectively, for one factor ANOVA.

ANOVARow(R1,r) = MS/_{A}MS_{W} |
ATESTRow(R1,r) = p-value of A factor |

ANOVACol(R1,r) = MS/_{B}MS_{W} |
ATESTCol(R1,r) = p-value of B factor |

ANOVAInt(R1,r) = MS/_{AB}MS_{W} |
ATESTInt(R1,r) = p-value of AB factor |

The following array functions are used to convert data between Excel’s formatting for Anova and standard format:

StdAnova1(R1) |
converts data in R1 in standard format into Excel Single Factor Anova format |

Anova1Std(R1) |
converts data in R1 in Excel Single Factor Anova format into standard format |

StdAnova2(R1) |
converts data in R1 in standard format into Excel Two Factor Anova format |

Anova2Std(R1, r) |
converts data in R1 in Excel Single Factor Anova format with r rows per group into standard format |

Anova3Rows(R1) |
converts data in R1 in standard Three Factor Anova columns format into Three Factor Anova rows format |

Anova3Cols(R1) |
converts data in R1 in standard Three Factor Anova rows format into Three Factor Anova columns format |

StdNested(R1) |
converts data in R1 in standard format into Excel Nested Anova format |

The following array functions are used to perform ANOVA via regression:

SSAnova2(R1, r) |
returns a column array with SSRow, SSCol, SSInt and SSW for a two factor ANOVA for the data in R1 using a regression model; if r > 0 then R1 is assumed to be in Excel Anova format with r rows per sample, while if r = 0 or is omitted then R1 is assumed to be in standard format; data is w/o headings |

SSAnova3(R1) |
returns a column array with SSA, SSB, SSC, SSAB, SSAC, SSBC, SSABC and SSW for a three factor ANOVA for the data in R1 using a regression model where the data in R1 is assumed to be in standard format by columns w/o column headings |

The following functions are related to ANOVA:

LEVENE(R1, type) |
p-value of Levene’s test for the data in range R1 (organized by columns) where type = 0 for deviations from group means, type = 1 for deviations from group medians and type = -1 for deviations from 10% trimmed group means |

FKTEST(R1) |
p-value of Fligner-Killeen test for data in range R1 (organized by columns) |

GGEpsilon(R1, ngroups, raw) |
Greenhouse and Geisser epsilon value for the data in range R1 where ngroups = the number of groups; if raw = TRUE then R1 contains raw data, otherwise it contains a covariance matrix |

HFEpsilon(R1, ngroups, nsubj) |
Huynh and Feldt epsilon value for the data in range R1 where ngroups = the number of groups; if nsubj = 0 then R1 contains raw data, otherwise it contains a covariance matrix which is derived from raw data with nsubj subjects (corresponding to rows). |

DunnSidak(α, k) |
1−(1−α)^{1/k} |

ADJK(nrows, ncols) |
adjusted k value for Tukey HSD after two factor ANOVA |

FSTAR(R1) |
Brown-Forsythe’s test statistic F* on the data in range R1 |

DFSTAR(R1) |
df* for Brown-Forsythe’s test on the data in range R1 |

BFTEST(R1) |
p-value of the Brown-Forsythe’s test statistic on the data in range R1 |

There are also the following array functions:

WELCH_TEST(R1, lab) |
returns the column range: F, df1, df2, and p-value for Welch’s test for the data in range R1 |

FSTAR_TEST(R1, lab) |
returns the column range: F, df1, df2, and p-value for Brown-Forsythe’s test for the data in range R1 |

KW_TEST(R1, lab, ties) |
returns the column range: H, H-ties, df and p-value for Kruskal-Wallis test for the data in range R1, if ties = TRUE (default) a ties correction is applied |

If *lab* = TRUE then the output includes a column of labels, while if *lab* = FALSE (the default) only the data is outputted.

**Non-parametric tests**

SignTest(R1, m, tails) |
p-value for the sign test where R1 contains the sample data and m = the hypothesized median |

RANK_COMBINED(x, R1, R2, order) |
rank of x in the combined range R1 and R2 taking ties into account; order is as in RANK_AVG |

RANK_SUM(R1, R2, order) |
sum of the ranks of all the elements in R1 based in the combined range R1 and R2 taking ties into account; order is as in RANK_AVG |

RANK_SUM(R1, k, order) |
sum of the ranks of all the elements in the k^{th} column of R1 taking ties into account; order is as in RANK_AVG |

WILCOXON(R1, R2) |
minimum of W and W′ for the samples contained in ranges R1 and R2 |

WILCOXON(R1, n) |
>minimum of W and W′ for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 |

WTEST(R1, R2, tails) |
p-value of the Wilcoxon rank-sum test for the samples contained in ranges R1 and R2 |

WTEST(R1, n, tails) |
p-value of the Wilcoxon rank-sum test for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 |

MANN(R1, R2) |
U for the samples contained in ranges R1 and R2 |

MANN(R1, n) |
U for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 |

MTEST(R1, R2, tails) |
p-value of the Mann-Whitney U test for the samples contained in ranges R1 and R2 |

MTEST(R1, n, tails) |
p-value of the Mann-Whitney U test for the samples contained in the first n columns of range R1 and the remaining columns of range R1. If the second argument is omitted it defaults to 1 |

MANN_EXACT(R1, R2, tails) |
p-value of the Mann-Whitney exact test for the samples contained in ranges R1 and R2; tails = 1 or 2 (default) |

SRank(R1, n) |
T for a single sample contained in range R1 minus n. If the second argument is omitted it defaults to zero. |

SRTEST(R1, n, tails) |
p-value for Signed-Rank test using the normal distribution approximation for the sample contained in range R1 minus n, If the second argument is omitted it defaults to zero. |

SRankPair(R1, R2) |
T for a pair of samples contained in the one-column ranges R1 and R2, representing the paired sample |

SRankPair(R1) |
T for a pair of samples contained in range R1, where R1 consists of two columns, one for each paired sample |

SRTESTPair(R1, R2, tails) |
p-value for Signed-Rank test using the normal distribution approximation for the pair of samples contained in the one-column ranges R1 and R2, representing the paired sample |

SRTESTPair(R1,,tails) |
p-value for Signed-Rank test using the normal distribution approximation for the pair of samples contained in range R1, where R1 consists of two columns, one for each paired sample |

KRUSKAL(R1) |
Kruskal-Wallis test statistic for the data in R1 |

KTEST(R1) |
p-value for Kruskal-Wallis Test for the data in R1 |

FRIEDMAN(R1) |
Friedman test statistic for the data in R1 |

FrTEST(R1) |
p-value for Friedman’s Test for the data in R1 |

MOODS_STAT(R1) |
chi-square statistic for Mood’s Median Test for the data in R1 |

MOODS_TEST(R1) |
p-value for Mood’s Median Test for the data in R1 |

COCHRAN(R1, raw) |
Cochran’s Q statistic for the data in R1 |

QTEST(R1, raw) |
p-value for Cochran’s Q Test for the data in R1 |

RLower_CRIT(n1, n2, α, tails) |
lower critical value for the runs test with n1 T’s and n2 F’s where α is the significance level (default .05) and tails = 1 or 2 (default) |

RUpper_CRIT(n1, n2, α, tails) |
upper critical value for the runs test with n1 T’s and n2 F’s where α is the significance level (default .05) and tails = 1 or 2 (default) |

The argument *tails* = the # of tails = 1 (default) or 2, except for MANN_EXACT where default = 2.

Two data input formats are supported for the COCHRAN, QTEST and QRATIO functions: raw data (*raw *= TRUE) and summarized data, i.e. a multi-variable frequency table (*raw* = FALSE, default).

There are also the following array functions:

MANN_TEST(R1, R2, lab, alpha, tails, ties, cont) |
returns the column range: U, alpha, tails, z, r, U-crit, p-value for the Mann-Whitney test for the data in ranges R1 and R2 |

SRANK_TEST(R1, R2, lab, alpha, tails, ties, cont) |
returns the column range: T, alpha, tails, z, r, T-crit, p-value for the paired Wilcoxon signed ranks test for the data in ranges R1 and R2 |

SRANK_TEST(R1, hyp, lab, alpha, tails, ties, cont) |
returns the column range: T, alpha, tails, z, r, T-crit, p-value for the one sample signed ranks test for the data in range R1 and hypothetical median hyp (default = 0) |

RUNSTEST(s, lab, tails) |
returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approx.), p-value (exact test) for the one-sample runs test on the data in string s |

RUNSTEST(R1, lab, tails) |
returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approximation), p-value (exact test) for the one-sample runs test on the numeric data in range R1 |

RUNS2TEST(R1, R2, lab, iter) |
returns the column range: n1, n2, mean, std dev, runs, tails, z-stat, p-value (normal approximation), p-value (exact test) for the two-sample runs test on the data in ranges R1 and R2 when iter = 0 (default); if iter > 0 the test is run iter times to deal with ties, one column of output for each unique value for the # of runs |

QRATIO(R1, raw) |
returns a row range with the percentages of each independent variables in Cochran’s Q test |

If* lab* = TRUE then the output includes a column of labels, while if *lab* = FALSE (the default) only the data is outputted, *tails* = the # of tails: 1 or 2 (default), *alpha* is the significance level (default = .05), if *ties* = TRUE (default) then a correction for ties is applied. If *cont* = TRUE (default) then a continuity correction is applied. For MANN_TEST if COUNT(R1)+COUNT(R2) ≤ 26 and for SRANK if COUNT(R1) ≤ 26 then an exact test is also performed, in which case the critical value and p-value for this test is also output.

For small samples, the p-value of Wilcoxon’s Signed Ranks one-tail test for test statistic *T* and sample size *n *is given by PERMDIST(*T, n*, TRUE). The critical value is PERMINV(*α, n*). The two tail test is given by 2* PERMDIST(*T, n*, TRUE) and PERMINV(*α*/2*, n*). See below for PERMDIST and PERMINV.

**Kolmogorov-Smirnov and Lilliefors functions**

KDIST(x, m) |
value of the Kolmogorov distribution function at x |

KINV(p, m) |
inverse of KDIST; i.e. KINV(p, m) = x where 1−KDIST(x, m) = p |

KSDIST(x, n) |
p-value of the one-sample Kolmogorov-Smirnov test at x for samples of size n |

KSINV(p, n) |
critical value of the one-sample Kolmogorov-Smirnov test at p for samples of size n |

LDIST(x, n) |
p-value of the Lilliefors test for normality at x for samples of size n |

LINV(p, n) |
critical value of the Lilliefors test at p for samples of size n |

LTEST(R1) |
D-max for the data in range R1 based on KS test or Lilliefors test |

KSDIST(x, n1, n2, b, m) |
p-value of the two-sample Kolmogorov-Smirnov test at x (i.e. D-stat) for samples of size n1 and n2 |

KSINV(p, n1, n2, b, iter, m) |
critical value of the two-sample Kolmogorov-Smirnov test at p for samples of size n1 and n2 |

KS2TEST(R1, R2, lab, α, b, iter, m) |
array function which outputs a column vector with the values D-stat, p-value, D-crit, n1, n2 from the two sample KS test for the samples in ranges R1 and R2, where α is the significance level (default = .05) and b, iter and m are as in KSINV. |

If R2 is omitted (the default) then R1 is treated as a frequency table. If *lab* = TRUE then an extra column of labels is included in the output; thus the output is a 5 × 2 range instead of a 1 × 5 range if *lab* = FALSE (default). *m* = the # of iterations used in calculating an infinite sum (default = 10).

**Reliability functions**

SPLIT_HALF(R1, R2) |
split half coefficient for the data in ranges R1 and R2 |

SPLITHALF(R1, type) |
split-half coefficient for the scores in the first half of the items in R1 vs. the second half of the items if type = 0 and the odd items in R1 vs. the even items if type = 1. |

GUTTMAN_SPLIT(R1, s) |
Guttman’s lambda for the data in range R1 based on the split described by string s consisting of 0’s and 1’s |

GUTTMAN(R1) |
the Guttman’s reliability measure for the data in range R1, i.e. the maximum Guttman’s lambda based on all possible splits; when the number of splits is too large, a second argument iter can be used to find an approximate maximum Guttman’s lambda based on a randomly generated iter number of splits. |

SB_SPLIT(R1, s) |
split half coefficient (after Spearman-Brown correction) for data in R1 based on the split described by string s consisting of 0’s and 1’s |

SB_CORRECTION(r, n, m) |
Spearman-Brown correction when the split-half correlation based on an m vs. n–m split is r. If n is omitted, then it is assumed that there is a 50-50 split. If n is present but m is omitted, then it is assumed that m = n/2. |

SB_PRED(m, rho, n) |
Spearman-Brown predicted reliability based on m items when Spearman-Brown for n items is rho. |

SB_SIZE(rho1, rho, n) |
the number of items necessary to bring the Spearman-Brown predicted reliability up (or down) to rho1 from n items with Spearman-Brown of rho. |

KUDER(R1) |
Kuder-Richarderson Formula 20 coefficient for the data in range R1 |

CRONALPHA(R1, k) |
Cronbach’s alpha for the data in range R1 if k = 0 and Cronbach’s alpha with kth item (i.e. column) removed if k > 0 |

CALPHA(R1) |
array function which returns a row of Cronbach’s alpha for R1 with each item removed |

KAPPA(R1, k, lab, α, orig) |
array function which returns Cohen’s (Fleiss’) kappa for the data in R1 when k = 0 (default) and kappa for category k when k > 0, plus standard error, z-stat, p-value and lower and upper bound of 1−α confidence interval (α defaults to .05) |

WKAPPA(R1, R2, lab, α) |
array function which returns weighted kappa for the data in R1 using the weights in R2, plus standard error and lower and upper bound of 1−α confidence interval (α defaults to .05). |

ICC(R1, class, type, lab, α) |
array function which outputs the intraclass correlation coefficient ICC(class, type) plus the lower and upper bound of the 1−α confidence interval for the data in R1; default values are class = 2, type = 1, α = .05. |

KENDALLW(R1, lab, ties) |
array function which returns a column range consisting of Kendall’s W, r, χ^{2}, df and p-value; if ties = TRUE then a ties correction is applied |

ITEMDIFF(R1, mx) |
item difficulty index for the scores in R1 where mx is the maximum score for the item |

ITEMDISC(R1, R2, p, mx) |
item discrimination index where R1 contains the scores for each subject for a single item and R2 contains the corresponding scores for all items based on the top/bottom p% of total scores and mx is the maximum score for the item whose scores are contained in R1 |

Where *lab* is one of the arguments, if *lab* = TRUE, then an extra column of labels is appended to the output (default = FALSE).

If range R2 is omitted in WKAPPA it defaults to the unweighted measure where the weights on the main diagonal are all zeros and the other weights are ones. Range R2 can also be replaced by a number *r*. A value of *r* = 1 means the weights are linear (as in Figure 1), a value of 2 means the weights are quadratic. In general this means that the equivalent weights range would contain zeros on the main diagonal and values (|*i−j*|)^{r} in the *i*th row and* j*th column when *i ≠ j*.

**Partitioning**

INIT_SPLIT(n, m) |
a string of length n consisting of m 0’s followed by n–m 1’s. If omitted m defaults to n/2. |

NEXT_SPLIT(s) |
the string representing the next split after the split represented by s. |

RAND_SPLIT(n, m) |
a random string of length n consisting of m 0’s and n–m 1’s. If omitted m defaults to n/2. |

COV_SPLIT(R1, s) |
sample covariance for the data in range R1 based on the split described by string s. |

CORR_SPLIT(R1, s) |
correlation for the data in range R1 based on the split described by string s. |

INIT_PARTITION(n) |
a string consisting of n 0’s |

NEXT_PARTITION(n) |
the string representing the next partition after the partition represented by s. |

RAND_PARTITION(n) |
a random string of length n |

These strings consist only of 0’s and 1’s.

**Frequency table**

FREQTABLE(R1) |
array function which returns an n × 3 array which contains the frequency table for the data in range R1, where n = the number of unique values in R1 (i.e. the number of data elements in R1 without duplicates) |

FREQTABLE(R1, bsize, bmax) |
an array function which produces the frequency table for the data in range R1, assuming equally sized bins of size bsize; bmax = largest bin value, if omitted maximum value in R1 is used |

FREQ2RAW(R1, R2) |
array function which takes a frequency table described by R1 (data items) and R2 (frequencies) and outputs the equivalent raw data (in the form of a column vector) |

FREQ2RAW(R1) |
array function which is equivalent to FREQ2RAW(R2, R3) where R2 is the first column in R1 and R3 is the second column in R1 |

**Noncentral distribution functions**

NT_DIST (t, df, δ, cum, m) |
value of the noncentral t distribution T(df,δ) at t if cum = TRUE and pdf at t if cum = FALSE |

NCHISQ_DIST (x, df, λ, cum, m, prec) |
value of the noncentral chi-square distribution χ^{2}(df,λ) at x if cum = TRUE and pdf at x if cum = FALSE |

NF_DIST(x, df1, df2, λ, cum, m, prec) |
value of the noncentral F distribution F(df1,df2,λ) at x if cum = TRUE and pdf at x if cum = FALSE |

NT_INV (p, df, δ, m, iter) |
inverse of the noncentral t distribution T(df,δ) at p |

NCHISQ_INV (p, df, λ, m, iter, prec) |
inverse of the noncentral chi-square distribution χ^{2}(df,λ) at p |

NF_INV(p, df1, df2, λ, m, iter, prec) |
inverse of the noncentral F distribution F(df1,df2,λ) at p |

NT_NCP (p, df, x, m, iter) |
ncp value δ such that NT_DIST(x, df, δ, TRUE, m) = p |

NCHISQ_NCP (p, df, x, m, iter, prec) |
ncp value λ such that NCHISQ_DIST(x, df, λ, TRUE, m) = p |

NF_NCP(p, df1, df2, x, m, iter, prec) |
ncp value λ such that NF_DIST(x, df1, df2, λ, TRUE, m) = p |

*m* = maximum number of terms in the infinite sum that will be calculated: 1 – 170 (default 120) for the t distribution functions and default 1,000 for χ^{2} and F distribution functions. *prec* = desired level of accuracy for the power calculation of the χ^{2} and F distribution functions (default 0.000000001). *iter* = the number of iterations used to calculate the inverse or noncentrality parameter (default 40).

**Statistical power functions**

NORM1_POWER(d, n, tails, α) |
power of a one sample normal test when d = Cohen’s effect size |

NORM2_POWER(mm, s1, s2, n1, n2, tails, α) |
power of a two sample normal test when mm = |μ_{2}−μ_{1}| and s1 and s2 are the corresponding population standard deviations |

BINOM_POWER(p0, p1, n, tails, α) |
power of a one sample binomial test when p0 = probability of success on a single trial based on the null hypothesis, p1 = expected probability of success on a single trial |

CORREL1_POWER(r0, r1, n, tails, α) |
power of a one sample correlation test using the Fisher transformation when r0 = population correlation (based on the null-hypothesis), r1 = effect size (observed correlation) |

T1_POWER(d, n, tails, α, m) |
power of a one sample t test when d = Cohen’s effect size |

T2_POWER(d, n1, n2, tails, α, m) |
power of a two sample t test when d = Cohen’s effect size; if n2 is omitted or set to 0 then n2 is considered to be equal to n1 |

VAR1_POWER(ratio, n, tails, α) |
power of a one sample variance test where ratio = (effect size) |

VAR2_POWER(ratio, n1, n2, tails, α) |
power of a two sample variance test where ratio = (effect size) |

CHISQ_POWER(w, n, df, α, m, prec) |
power of a chi-square goodness of fit or independence test where w = Cohen’s effect size and df = degrees of freedom |

ANOVA1_POWER(f, n, k, type, α, m, prec) |
power of a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default), f = RMSSE effect size if type = 2 and f = noncentrality parameter if type = 0 |

REG_POWER(e, n, k, type, α, m, prec) |
power of multiple regression where k = # of predictors, e = Cohen’s effect size f if ^{2}type = 1 (default), e = R^{2} effect size if type = 2 and e = noncentrality parameter if type = 0 |

CALPHA_POWER(ca0, ca1, n, k, tails, α) |
power of Cronbach’s alpha test where ca0 = Cronbach’s alpha under the null hypothesis, ca1 = Cronbach’s alpha under the alternative hypothesis and k = # of items |

ICC_POWER(ρ0, ρ1, n, k, α) |
power of ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items |

*n*, *n*1, *n*2 = the sample size, *tails* = # of tails: 1 or 2 (default), *α* = alpha (default = .05) and *m *and* prec *as for the noncentral distribution functions.

**Sample size functions**

NORM1_SIZE(d, 1−)β, tails, α |
minimum sample size required to obtain power of at least 1−β in a one sample normal test when d = Cohen’s effect size |

NORM2_SIZE(mm, s1, s2, 1−β, tails, α, nratio) |
minimum sample size required to obtain power of at least 1−β in a two sample normal test when mm = |μ_{2}−μ_{1}| and s1 and s2 are the corresponding population standard deviations |

BINOM_SIZE(p0, p1, 1−)β, tails, α |
minimum sample size required to obtain power of at least 1−β in a one sample binomial test when p0 = probability of success on a single trial based on the null hypothesis, p1 = expected probability of success on a single trial |

CORREL1_SIZE(r0, r1, 1−β, tails, α) |
minimum sample size required to obtain power of at least 1−β in a a one sample correlation test using the Fisher transformation when r0 = population correlation (based on the null-hypothesis), r1 = effect size (observed correlation) |

T1_SIZE(d, 1−β, tails, α, m) |
minimum sample size required to obtain power of at least 1−β in a one sample t test when d = Cohen’s effect size |

T2_SIZE(d, 1−β, tails, α, nratio, m) |
minimum sample size required to obtain power of at least 1−β in a two sample t test when d = Cohen’s effect size |

VAR1_SIZE(ratio, 1−β, tails, α) |
minimum sample size required to obtain power of at least 1−β in a one sample variance test where ratio = (effect size) |

VAR2_SIZE(ratio, 1−β, tails, α, nratio) |
minimum sample size required to obtain power of at least 1−β in a two sample variance test where ratio = (effect size) |

CHISQ_SIZE(w, df, 1−β, α, m, prec) |
minimum sample size required to obtain power of at least 1−β in a chi-square goodness of fit or independence test where w = Cohen’s effect size and df = degrees of freedom |

ANOVA1_SIZE(f, k, 1−β, type, α, m, prec) |
minimum sample size required to obtain power of at least 1−β for a one-way ANOVA where k = # of groups, f = Cohen’s effect size if type = 1 (default) and f = RMSSE effect size if type = 2 |

REG_SIZE(e, k, 1−β, type, α, m, prec) |
minimum sample size required to obtain power of at least 1−β for multiple regression where k = # of predictors, e = Cohen’s effect size f if ^{2}type = 1 (default) and e = R^{2} effect size if type = 2 |

CALPHA_SIZE(ca0, ca1, k, 1−β, tail, α) |
minimum sample size required to obtain power of at least 1−β for Cronbach’s alpha test where ca0 = Cronbach’s alpha under the null hypothesis, ca1 = Cronbach’s alpha under the alternative hypothesis and k = # of items |

ICC_SIZE(ρ0, ρ1, k, 1−β, α) |
minimum sample size required to obtain power of at least 1−β for ICC(1,1) test where ρ0 = ICC(1,1) under the null hypothesis, ρ1 = ICC(1,1) under the alternative hypothesis and k = # of items |

*tails* = # of tails: 1 or 2 (default), *α* = alpha (default = .05), *m *and* prec *as for the noncentral distribution functions, *nratio* = the size of the second sample divided by the size of the first sample (default = 1) or negative of a specific value for the size of the second sample; default for 1−*β* = .80.

**Other statistical functions**

MULTINOMDIST(R1, R2) |
value of multinomial distribution where R1 contains the number of successes and R2 contains the corresponding probabilities of success |

UNIFORM_DIST(x, α, β, cum) |
pdf value at x for uniform distribution with end points α and β when cum = FALSE and corresponding cdf value when cum = TRUE |

UNIFORM_INV(p, α, β) |
inverse at p of the uniform distribution with end points α and β |

POISSON_INV(p, λ) |
inverse at p of the Poisson distribution with mean λ |

WEIBULL_INV(p, α, β) |
inverse at p of the Weibull distribution with parameters α and β |

NEGBINOM_INV(p, k, pp) |
inverse at p of the negative binomial distribution with parameters k and pp |

HYPGEOM_INV(p, n, k, m) |
inverse at p of the hypergeometric distribution with parameters n, k and m |

BETA(x, y) |
value of the beta function at x and y |

QDIST(q, k, df) |
studentized q cumulative distribution value for q with k independent variables and df degrees of freedom |

QINV(p, k, df, tails) |
inverse of the studentized q distribution, i.e. the critical value for the studentized q range; tails = 1 or 2 (default) |

PERMDIST(x,n, cum) |
pdf of permutation distribution with n elements at x if cum = FALSE and the cdf value if cum = TRUE |

PERMINV(p, n) |
inverse of the permutation distribution at p; i.e. the least value of x such that PERMDIST(x, n, TRUE) ≥ p |

PERM2DIST(x,n1, n2, cum) |
pdf of two sample permutation distribution with n1 and n2 elements at x if cum = FALSE and the cdf value if cum = TRUE |

PERM2INV(p, n1, 2)n |
inverse of the two sample permutation distribution at p; i.e. the least value of x such that PERMDIST(x, n12, TRUE) ≥ , np |

MANNDIST(x,n12, n, tails) |
p-value of the Mann-Whitney exact distribution with n1, 2 elements at nx, where tails = 1 (default) or 2 |

MANNINV(p, n12, n, tails) |
inverse of the Mann-Whitney exact distribution at p; i.e. the least value of x such that MANNDIST(x, n12, TRUE) ≥ , np |

RUNSDIST(r, n1, n2, cum) |
probability of getting r runs from a string with n1 T’s and n2 F’s if cum = FALSE (i.e. the pdf at r) and the probability of getting at most r runs from a string of n1 T’s and n2 F’s if cum = TRUE (i.e. the cdf at r) |

RUNSINV(p, n12), n |
inverse at p of the runs distribution with n1 T’s and n2 F’s |

F_DIST(x, df12, , dfcum) |
equivalent to F.DIST, except that df1 and2 can be non-integer df |

F_INV(p, df12), df |
inverse at p of the F distribution based on F_DIST |

IGAMMA_DIST(x, alpha, beta, cum) |
inverse gamma distribution at x |

IGAMMA_INV(p, alpha, beta) |
inverse of the inverse gamma distribution at p |

ICHISQ_DIST(x, df, cum) |
inverse chi-square distribution at x |

ICHISQ_INV(p, df) |
inverse at p of the inverse chi-square cdf |

DIRICHLET_DIST(pvector, avector) |
Dirichlet distribution at pvector (whose values add up to 1) based on alpha parameters in avector |

DIRICHLET_RAND(avector) |
an array pvector consisting of random values for the Dirichlet distribution with alpha values in avector |

**Other functions**

FText(R1) |
formula in cell R1 as text |

VER() |
current version of the Real Statistics Resource Pack (e.g. 4.13 Excel 2010/2013/2016) |

ExcelVer() |
Excel version (e.g. Excel 2010 is 14.0) |

ColLabel(n) |
label of the nth worksheet column (e.g. ColLabel(50) = “AX”) |

FirstBlank(R1) |
address (e.g. “B7″) of the first empty cell in range R1. If R1 contains no empty cells then the value “None” is returned |

FirstNonNum(R1) |
address of the first empty or non-numeric cell in range R1. If R1 contains no non-numeric cells then the value “None” is returned. |

RANDOM(a, b, type) |
non-volatile random number between a and b; if type = TRUE (default) then a decimal number is returned, if type =FALSE an integer is returned |

**Matrix functions**

DET(R1) |
same as MDETERM(R1). |

DIAG(R1) |
array function that returns a column vector with the values on the diagonal of the matrix in range R1 (esp. useful when R1 is a square matrix) |

DIAGONAL(R1) |
array function that returns a square matrix whose main diagonal consists of the elements in range R1 and has zeros elsewhere |

IDENTITY(k) |
array function that returns a k × k identity matrix; if k is omitted then the size of the highlighted range is used instead |

ISCELL(R1) |
TRUE if R1 is a single cell and FALSE otherwise |

ISSQUARE(R1) |
TRUE if R1 is a square range and FALSE otherwise |

LENGTH(R1) |
length of matrix in range R1 = the square root of the sum of the squares of all the elements in R1 (esp. useful for column or row vectors) |

NORM(R1) |
array function that returns the normalized version of the matrix in range R1 |

MERGE(R1, R2) |
array function which merges ranges R1 and R2, i.e. it returns a range containing the values in R2 to the right of those in R1 |

TRACE(R1) |
trace of matrix in range R1 |

SSCP(R1) |
equivalent to MMULT(TRANSPOSE(R1),R1) |

MPOWER(R1, k) |
array function that returns the square matrix in R1 multiplied by itself k times |

**Simultaneous linear equations**

ELIM(R1) |
array function which outputs the results of Gaussian Elimination on the augmented matrix found in the array R1. The shape of the output is the same as the shape of R1 |

LINEQU(R1) |
array function which returns a column vector with solution to linear equations defined by R1; returns an error if no solution or the solution is not unique |

By default, each of these functions assumes that an entry with absolute value less than 0.0001 is equivalent to zero. This value can be changed by using the optional *prec* argument in ELIM(R1, *prec*) and LINEQU(R1, *prec*).

**Table lookup**

Here R1 defines a table, including both the data and row/column headings.

INTERPOLATE(r, r1, r2, v1, v2, h) |
the value between v1 and v2 that are proportional to the distance that r is between r1 and r2, where v1 corresponds to r1 and v2 corresponds to r2 based on linear interpolation where h = FALSE, while if h = TRUE (default) a harmonic interpolation is used. |

MLookup(R1, r, c) |
the value in the table defined by range R1 in the row headed by r and the column headed by c. |

ILookup(R1, r, c, h) |
the value in the table defined by range R1 corresponding to row r and column c. If r or c can refer to some value that must be interpolated between row or column headings (using the INTERPOLATE function), provided those headings are numbers. If the first row (or column) heading is preceded by “>” it refers to values smaller than the next row (or column heading). If the last row (or column) heading is preceded by “>” it refers to values bigger than the previous row (or column heading). |

There are also the following lookup functions for specific tables. In the following *n, n*1*, n*2*, k* and *df* are generally positive integers, α is a decimal between 0 and 1 (default .05) and *tails* takes the value 1 or 2 (default). If *α* exceeds the largest value for alpha in the table or is smaller than the smallest value for alpha in the table then #N/A is returned. For values between those in the associated table, if *h* = TRUE (default) a harmonic interpolation is used, while if *h* = FALSE a linear interpolation is used.

TauCRIT(n, α, tails, h) |
critical value in the Kendall’s Tau table |

RhoCRIT(n, α, tails, h) |
critical value in the Spearman’s Rho table |

SRankCRIT(n, α, tails, h) |
critical value in the Wilcoxon Signed Rank table |

KSCRIT(n, α, tails, h) |
critical value in the Kolmogorov-Smirnov table |

LCRIT(n, α, tails, h) |
critical value in the Lilliefors table |

WCRIT(n1, n2, α, tails) |
critical value in the Wilcoxon Rank-Sum table |

MCRIT(n1, n2, α, tails, h) |
critical value in the Mann-Whitney table |

QCRIT(k, df, α, tails, h) |
critical value in the Studentized Range Q table |

DCRIT(k, df, α, tails, h) |
critical value in the Dunnett’s test table |

DLowerCRIT(n, k, α, h) |
lower critical value in the Durbin-Watson Table |

DUpperCRIT(n, k, α, h) |
upper critical value in the Durbin-Watson Table |

Related to the above critical values determined by one of the statistics tables are the following estimates of p-values based on a linear interpolation, if necessary, of the values in these statistics tables. The parameter *iter* = the # of iterations used to arrive at the p-value. If the p-value exceeds the largest value for alpha in the table then a value of 1 is returned; if the p-value is less than the smallest value for alpha in the table then 0 is returned.

WPROB(W, n1, n2, tails, iter) |
p-value for the Wilcoxon Rank-Sum test |

MPROB(U, n1, n2, tails, iter) |
p-value for the Mann-Whitney test |

SRankPROB(T, n, tails, iter) |
p-value for the Wilcoxon Signed Rank test |

KSPROB(x, n, tails, iter) |
p-value for the Kolmogorov-Smirnov test |

LPROB(x, n, tails, iter) |
p-value for the Lilliefors test |

SWPROB(n, W, b, h) |
p-value for the Shapiro-Wilk test for a sample of size n and statistic W; if b = TRUE then use the Royston algorithm is used, while if b = FALSE then a table look-up (table 2) is used (employing harmonic interpolation unless h = FALSE when linear interpolation is used). |

**Sorting and eliminating duplicates**

QSORT(R1, b) |
array function which fills highlighted range with data from R1 in sorted order (by columns); b is an optional parameter (default = TRUE); if b is TRUE then sort is in ascending order and if b is FALSE (or 0) sort is in descending order. |

QSORTRows(R1, col, ascend, head) |
array function which fills highlighted range with rows from R1 in sorted order based on the elements in the col^{th} column of R1; if ascend is TRUE (default) then sort is in ascending order and if ascend is FALSE (or 0) sort is in descending order; if head = TRUE (default is FALSE) then first row is presumed to contain column headings and is not sorted |

(R1, QSORT2Rowscol1, col2, ascend, head) |
array function which fills highlighted range with rows from R1 in sorted order using two sort keys col1 and col2; ascend and head are as for QSORTRows. |

(R1, QSORT2RowsMixedcol1, col2, ascend) |
array function which sorts R1 (containing only numeric data) as for QSORT2Rows, except that if ascend is TRUE (default) then primary sort on col1 is in ascending order and secondary sort on col2 is in descending order, while if ascend is FALSE then primary sort is descending and secondary sort is ascending |

SortUnique(R1, s) |
array function which fills highlighted range with unique elements from R1 in ascending sorted order; output is a single column |

ExtractUnique(R1, s) |
array function which fills highlighted range with unique elements from R1 in the order they appear first in R1; output is a single column |

NODUPES(R1, s, b) |
array function which fills highlighted range with data from R1 eliminating any duplicates (by columns); if b = TRUE then the data in R1 is sorted first (by column) and if b = FALSE then it is assumed that range R1 is in sorted order. |

SortRowsUnique(R1, s) |
array function which fills highlighted range with unique rows from R1 in ascending sorted order; output has same number of columns as R1. |

SortRowsCount(R1, s) |
array function which fills highlighted range with unique rows from R1 in ascending sorted order plus a count of the number of occurrence of that row in R1; output has one more column than R1. |

SortRowsSum(R1, s) |
array function which fills highlighted range with unique rows from R1 excluding last column in ascending sorted order plus a sum of the values in the last column of R1 for matching occurrences; output has same number of columns as R1. |

The string *s* is used as a filler in case the output range has more rows/cells than needed. This argument is optional and defaults to the error value #N/A.

**Sampling and reshaping**

REVERSE(R1, s) |
array function which fills highlighted range with data from R1 in reverse order (by columns) |

REV(R1) |
array function which reverses the elements in R1; resulting range should have the same size and shape as R1. |

RESHAPE(R1, s) |
array function which fills highlighted range with data from R1 (by columns) |

RESHAPE(R1, s, nrows, ncols) |
array function which fills an nrows × ncols range with data from R1 (by columns) |

SHUFFLE(R1, s) |
array function which fills highlighted range with a permutation of the data from R1 (sampling without replacement) |

RANDOMIZE(R1, s) |
array function which fills highlighted range with a random selection of data from R1 (sampling with replacement) |

DELBLANK(R1, s) |
array function which fills highlighted range with data from R1 (by columns) omitting any empty cells |

DELNonNum(R1, s) |
array function which fills highlighted range with data from R1 (by columns) omitting any non-numeric cells |

DELROWBLANK(R1, b, s) |
array function which fills highlighted range with data from R1 (by columns) omitting any row with one or more empty cells; if b is TRUE then the first row of R1 (presumably containing column headings) is always copied (even if it contains a blank cell); the second argument is optional defaults to b = FALSE. |

DELROWNonNum(R1, b, s) |
array function which fills highlighted range with data from R1 (by columns) omitting any row with one or more non-numeric cells; if b is TRUE then the first row of R1 (presumably containing column headings) is always copied (even if it contains a non-numeric cell); the second argument is optional and defaults to b = FALSE. |

DELErr(R1) |
array function which fill highlighted range with data from R1 except that any cells with an error value are replaced by the empty value “” |

SUBMATRIX(R1) |
array function that outputs an nrows × ncols subrange of R1 (nrows defaults to m – row1 + 1 and ncols defaults to n – col1 + 1) starting at the cell in the row1^{th} row (default 1) and col1^{th} column (default 1) of the m × n range R1. |

SelectCols(R1, ss, c) |
array function which fill highlighted range with the columns from R1 defined by the string ss, where ss contains a list of column numbers of the columns to be selected from R1; if c > 0 then also sort the rows in the output based on the elements of the cth column in the output |

The string *s* is used as a filler in case the output range has more cells/rows than needed. This argument is optional and defaults to the error value #N/A.

**Categorical Coding**

CATCODE(R1) |
array function which fills highlighted array with simple coding of values in range R1; if R1 is an m × n range, highlight an m × n range |

TAGCODE (R1, b) |
array function which fills highlighted array with dummy coding of values in column range R1; if R1 has m rows and k unique values then highlight an m × (k–1) range; if b = TRUE (default) normal dummy coding is used, if b = FALSE alternative dummy coding is used. |

**Iterative Proportional Fitting Procedure**

IPFP2(R1) |
array function which fills highlighted array with the output from the IPF procedure on the two-contingency table with targeted marginal totals in R1; if R1 is an m × n range, highlight an m-1 × n-1 range |

IPFP3 (R1, R2) |
array function which fills highlighted array with the output from the IPF procedure on the three-contingency table in R1 with targets in R2. If R1 is an m × n range, highlight an m × n range. |

Hi Charles, how do I do normality test for residuals in Excel, when I use Shapiro-Wilk test in SPSS? Thanks so much!

Des,

I don’t use SPSS, but the following webpage shows how to do this in Excel:

http://www.real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/shapiro-wilk-test/

Charles

Hi Charles,

Thanks for lots of learning materials. I have requirement to find a positive or negative trend exists or not. So I wanted to do a Mann Kendal Trend test . I request your help how to do this in excel. I have some information in nist.gov I have done that but the table has values more than 30 which is not in the site. I read some articles where it can be done without Mann’s table. Kindly share the method how it can be done in excel or R.

Thanking you

Best regards,

Habeeb

Habeeb,

Yes, with this big a sample, you can use the normal distribution approximation which doesn’t require the tables. See the following:

Mann-Whitney Test

Charles

Hi!

I tried to use the Functions. I can use some like LogiCoeff but I can’t use DRegResiduals and all that about Deming. It don’t recognise the Functions. Can you help me?

Thanks

Tury,

What do you see when you enter the formula =VER()

Charles

4.9 Excel 2010/2013/2016

I tried with Ecxel-Translator in Italian language but there are not traslation for DRegResiduals.

Tury

Tury,

This is a Real Statistics function and is not available in standard Excel. You can use this function when you download the Real Statistics software, which is available for free.

Charles

Hi,

I have installed Real Statistics many time ago, I can open the window “Real Statistics” with ctr+m.

Tury,

If you are not finding DRegResiduals in the version of Real Statistics, then perhaps you need to download a newer version.

Charles

Dear sir,

Please first allow me to congratulate you on this REALLY NEEDED add-in.

I think that I have a simple question, but after 1 week on your website (and having downloaded the add-in), I still cannot find the solution, so I’m obliged to write to you:

I have 40 men and 60 women, to whom I asked “do you like Internet?”. The possible replies were “not at all”, “a bit”, “somehow”, “a lot” and “I can’t live without!” (I consider this Likert scales as equal distant and, of course, it is just an example with random numbers). I got the following results: Men [2, 6, 15, 10, 7], meaning that men tend to feel “medium” and Women [1, 3, 15, 25, 16], meaning that women seem more positive.

I want to prove (or not) that Men react differently than Women (at a=0.05). My Statistics professor insists that the result should be p=0.014 (by “linear-by-linear” chi-square), but I think that Pearson (p=0.108) is more correct (at least based on what you mention in your website).

May I ask:

a) What is your opinion,

b) If my professor is correct, how do I do it with your add on and

c) Which group likes internet more?

Thank you very much in advance, people like you tend to be rare nowdays…

Chris,

The chi-square test for independence is used with categorical data. The data that you have is ordered and so chi-square isn’t really the right test. Also you have too many cells with fewer than 5 elements.

Better to use a Likert scale with “not at all”, “a bit”, “somehow”, “a lot” and “I can’t live without!” becoming 1,2,3,4,5 (or something similar). You can then use the Real Statistics Frequency data analysis tool to create the raw data for each sample (frequency table to raw data conversion).

Next you need to conduct a t test or better yet a Mann-Whitney test since the data may not be normally distributed. These are provided by the Real Statistics software. Once you do this, you should get a p-value similar to that given by your professor.

Charles

It works!!! Thank you from the bottom of my heart! If you are ever in Greece for vacations, you’ll always have a place to stay (you can find my e-mail, if need be) !

Chris,

Good to hear.

Charles

I installed your pack into Excel. I found the Sample Size location, but all I want to find the sample size of, let’s say, 1,000 items for an attribute, such as “does documentation support the amount of an expense?” I am not sure what to put in the dialog box. I want 90% confidence, ±5% precision, expected rate 3% (A guess up front). What would I enter in the parameters to get a sample size?

John,

I can’t tell from “does documentation support the amount of an expense?”, which statistical test you are using. I need to know this to answer your question.

Charles

I just want to get the sample size needed for any given “universe” and I am looking for the easiest way to do that for a simple attribute sample. The attribute is a simple yes or no. As you can see, I am no expert at statistics, so I have to keep it as simple as possible – plug in the parameters and get a number.

John,

I understand that you want to keep it simple, but I still need to know what test you are conducting.

Charles

I don’t understand what you mean by what test I am doing. I am just looking at a sample and determining if the staff processed it right – yes or no.

John,

In this case, I don’t understand your question.

Charles

When I try to use the logitcoeff functions on my table of data, I am just getting a blank answer, any idea what I might be doing wrong? Or could I send you a copy of my sheet for you to take a look? Thanks!

Yes, you can send me a copy.

Charles

Thanks, but it turns out I was just being bad with Excel and array formulas. Your sample Excel file with all of the formulas showed me what I was doing wrong (just took several tries to figure it out).

As a side note, just wanted to say thanks for putting all of this together and being very responsive, it’s all extremely helpful!

Hi,

I downloaded the Real Statistics pack for Excel 2007, and it does not contain Logistics Regression. May I know if there are any changes made to the pack?

Casper

Casper,

Logistic Regression is now an option of the Regression data analysis tool.

Charles

Charles,

Would like to ask if the Real Statistics volume has an initial count on a stress tester, please forgive me as i am still a newbie to the Excel programme 🙂

Would be really helpful if it was installed as well

Thank You 🙂

No, if you are referring to stress testing of banks or investments.

Charles

Great work done Charles. this is the first time i’m having something useful for my Mac excel. your package is going to help me a lot.

will be counting on you for more

Great to hear.

Charles

The site does not accept the characters for different : what is intended is

shw different from zero where you find shw 0

shw <> 0

Hi,

Congratulations for your superb job that made me feel both surprised and gratified for having my name in the references section.

I found that, like me, you document some sheets with the formulas used in adjacent cells. I don’t know which method you use. I consider that using the following function is very useful. In models you can toggle between showing and hiding the result of the function.

Function GetFormula(cell As Range, Optional shw As Integer) As String

If IsMissing(shw) Then

shw = 0

End If

If shw 0 Then

GetFormula = cell.Formula

Else

GetFormula = cell.Formula

End If

End Function

Regards

António Teixeira

Hi António,

Thank you for your kind words. I also appreciated your paper, which helped me when I was researching how to calculate statistical power.

Thanks also for your suggestion regarding documenting formulas. I am using a more primitive version of the GetFormula formula, which is called FTEXT (i.e. formula text), but I may try to modify it along the lines you are suggesting.

Charles

Thanks for this, it is really amazing and helpful. Cheers!

Hi Charles,

Thank you for sharing this great resource. I have installed Real Statistics and using the tool successfully. Some reason, I couldn’t use Real Statistics Supplemental Functions. For example, I am trying to calculate AIC using”LogitRSquare(R1, lab, raw, iter)”, but it gives nothing or sometimes “LL”.

When I click on the cell where you calculated AIC in the “Sheet: Logit 6a” in the example work book that I download, it shows “=’http://www.real-statistics.com/Users/C/AppData/Roaming/Microsoft/AddIns/RealStats.xlam’!LogitRSquare(A4:D15,TRUE) “. I thought, it should show my local directory where I installed. Is this a problem?

If you have any information to use Real Statistics Supplemental Functions, that would be great.

Thanks,Rben

Rben,

The reason why you are only getting a strange result when you use LogitRSquare(R1, lab, raw, iter) is that this is an array function. To use this function, you need to highlight a range of cells (a 7 x 2 range for this function if lab is set to TRUE) and then press Ctrl-Shft-Enter (instead of just Enter). See the webpage http://www.real-statistics.com/excel-environment/array-formulas-functions/ for more details about array functions in Excel.

When using a spreadsheet that someone else has prepared which refers to an Excel add-in, you need to tell Excel where that add-in is on your computer. Fortunately you only need to do this once. This is the reason why you see the string =’http://www.real-statistics.com/Users/C/AppData/Roaming/Microsoft/AddIns/RealStats.xlam’!LogitRSquare(A4:D15,TRUE).

Please refer to webpage http://www.real-statistics.com/free-download/examples-installation/ for how to do this.

Charles

Charles,

Thank you very much for making this tool available. After weeks of search, I found yours to get eigenvectors of square symmetric matrix.

My matrix is quiet large ( 200 by 200). I am using Excel 2010 with 2.60Ghz processor and 12GB Ram. When I run the eVectors command, it is taking over 20 minutes to execute and display results.

Is there limitations in the code for size or otherwise? Is there anything you are aware of that might cause slowness in the calculations?

Thank you

Sam,

The code that I use is not optimized for performance. There are various techniques for doing this that I have not yet implemented. I will be re-evaluating this procedure shortly. I would like to modify the code so that it works with non-symmetric matrices. I will also see if I can improve the performance.

Charles

Hi Charles,

I tried the Wilcoxon rank-sum test with Excel (the function WTEST), but the parameter “t” doesn’t work. If I just write “=WTEST(R1;R2)” it’s ok, but if I try to add the parameter “t” (1 or 2) at the end it doesn’t work. It says “#VALUE!”. Do you know why?

Thanks,

Julien

Julien,

I just retested the WTEST function and it works fine when the t parameter is added. What version of Excel are you using? Is it for English language or some other language?

Charles

Hi Charles,

I use Microsoft Excel for Mac 2011 in English

Here is my command: =WTEST(L2:L35;M2:M35;2)

Any idea?

thanks for the wonderful resource!

putting the list of functions in as images make it

impossible to search. i was searching for find something

about spearman’s rank correlation test and was

coming up empty. i was going to give up

until i noticed the images.

it would be very helpful to change them to text so

they can be searched.

Kirk,

Originally I had intended these lists to be in the form of tables, but they didn’t display very well, and so I replaced them by images, but I agree with you that this makes it hard to search. Over the next few days I will be modifying the webpages to support a new release of the software, which should be out later today. I will also modify the list of functions as part of this process.

Charles

COV() function returns just a value, how do I get the whole matrix as the result ?

Charan,

COV() is an

arrayfunction, and so you need to highlight the appropriate range and then press Ctrl-Shift-Enter (i.e. while holding down the Control and Shift keys, press the Enter key). E.g. suppose your data in in the range A1:C10. Since this represents 3 variables (3 columns), you need a 3 x 3 matrix to hold the output. You could therefore highlight the range E1:G3, enter the formula =COV(A1:C10) and press Ctrl-Shift-Enter. The whole matrix will now be visible.Charles

Thanks Charles,

I have figured it out from other posts, selecting the Matrix Operations after clicking Ctlr+M. Have just started using other functionalities too, especially PCA. They look promising, keep up the good work…

This is great work. Many thanks, I look forward to sharing this with my students.

The second question is whether you have any separate tutorial/explanatory material that you have prepared for the Examples worksheets? It is difficult to follow without some explanation of what the origin of the data sets are. Perhaps I am just not seeing this on the website.

Frank,

Everything in the Examples worksheets is on the website somewhere. The first two worksheets (TOC0 and TOC) try to link the Examples worksheets to the website. I will try to see if I can improve on this in the future.

Charles

This is a wonderful software resource! The 2 questions I have are (1) If you could specify what the specific formula is for the non-parametric p values that are generated using the supplemental MTEST or WTEST commands? The p values do not match what I obtain using other software – appearing about half as great (i.e., more significant). I’m trying to understand why.

Frank,

The MTEST and WTEST functions return the one-tail version of the p-value. You need to double these values to get the two-tail version of the p-value. I have just revised these functions today so that you can specify whether you want the one-tail or two-tail test. To use this capability you need to download the latest release of the software (Release 2.14.1).

Charles

Your package looks great.

I managed to make the vba Tools (like Regression) work, however I cannot use the supplemental functions like slopestest.

=RealStats.xlam!SlopesTest(A5:A19;B5:B19;D5:D20;E5:E20;FALSCH)

(German Excel 2013, Win 8.1)

RealStats.xlam is shown in the VBA code

=ver() Shows 2.8.1

but slopestest does not show up.

can you please help

best regards

Manfred

Manfred,

SlopesTest was introduced in Rel 2.10. If VER() works then probably all the other functions work too. You just need to download the latest release to get SlopesTest. I have tested that release with the Italian version of Excel and it works; let’s hope that the German version works as well.

Charles

Hi,

I’ve already downloaded RealStats.xlam, I can also activate the statistical package. However I do know how to operate the Supplemental Functions. It seems unavailable in the Function list neither it cannot work if I type the function in any cells. Someone please give me a clue.

Thanks

Hi Hilman,

Once you have downloaded and installed RealStats.xlam you should be able to type any of the functions in the statistical package in a cell (or range) in an Excel worksheet. You will know whether you have successfully installed the software by typing the formula =VER() into any cell. You should see the release value 2.8.1 (or something similar). If you don’t see this you need to review the instructions on how to install the software at http://www.real-statistics.com/free-download/real-statistics-resource-pack/.

The supplemental functions operate exactly as the standard Excel functions, except that when you enter a supplemental function in a cell you won’t be prompted for its arguments. To get an idea of what the arguments are you can type = followed by the name of the function and then Ctrl-Shift-A. You can also see a list of the supplemental functions in Excel’s Function List under the user defined functions category.

A clearer list of the supplemental functions is available on the website — e.g. http://www.real-statistics.com/excel-capabilities/supplemental-functions/ and http://www.real-statistics.com/excel-capabilities/real-statistics-multivariate-functions/.

Note that to use the data analysis tools in RealStats.xlam you need to press Ctrl-m.

Charles

Hi,

I use a lot of your functions in excel.

However, I find that while your two factor ANOVA supplemental function in excel gives the correct F value, the p-value, calculated using the ATestInt function is inconsistent with results from other statistical packages. Can you please check into this?

regards

Ram

Hi Ramkumar,

Thanks for bringing this issue to my attention. I have checked the software and found that the ATESTInt function just returns FDIST(F, dfInt, dfW). If the value of F is correct then the only places where the formula can give the wrong answer are if dfInt or dfW is incorrect or Excel’s FDIST function gives the wrong answer. These don’t seem so likely.

Can you check whether the value of the supplemental function dfInt(R1, r) or dfWF(R1, r) gives an answer different from the one obtained from other statistical packages? Alternatively can you send me a specific example where you obtained a value from ATESTInt(R1,r) which was different from other statistical packages? In this way I can check where the problem is.

Charles