Real Statistics Release 4.8

I am pleased to announce Release 4.8 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

The spreadsheets for all the examples used on the Real Statistics website are available for free download (Download Examples Workbooks) are being updated for compatibility with the new release. These spreadsheets are contained in four Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2, Multivariate Examples and Time Series Workbook. See Workbook Examples for a description of which examples are contained in which files.

Release 4.8 contains the following new features:

Shapely-Owen Decomposition

Adds the following new array function:

SHAPELY(Rx, Ry) = a column range which gives the Shapely-Owen decomposition of R-square for the linear regression of Rx (which contains the x data values) on Ry (which contains the y data values).

This decomposition shows the relative contribution of each of the independent variables to the regression model.

Box-Cox Transformation

Adds the following new functions which carry out Box-Cox transformations

BOXCOX(R1, λ): array function which returns a range containing the Box-Cox transformation of the data in range R1 using the given lambda value. If the lambda argument is omitted, then the transformation which best normalizes the data in R1 is used, based on maximizing the log-likelihood function.

BOXCOXLL(R1, λ) = log likelihood function of the Box-Cox transformation of the data in R1 using the given lambda value

BOXCOXLambda(R1) = the value of lambda which maximizes the log likelihood function of the Box-Cox transformation of the data in R1

New Support for the Multivariate Normal Distribution

Adds the following new functions

BNORMSDIST(x, y, r, cum) = the cdf of the standard bivariate normal distribution at x and y with correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

BNORMDIST(x1, x2, m1, m2, s1, s2, r, cum) = the cdf of the standard bivariate normal distribution at at x1 and x2 with means m1 and m2, standard deviations s1 and s2 and correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

MNORMRAND(Rm, Rc): generates an array containing a random k × 1 vector from a k-dimensional multivariate normal distribution where Rm is the range that contains the k × 1 means vector and Rc is the range that contains the k × k covariance matrix

BNORMRAND(m1, m2, s1, s2, r): generates a random 2 × 1 vector from a bivariate normal distribution with means m1, m2, standard deviations s1, s2 and
correlation coefficient r

These last two functions can be used to generate a random vector which comes from a multivariate normal distribution with the specified parameters.

Mardia’s Test for Multivariate Normality

Adds the following functions that test whether the data in range R1 is drawn from a multivariate normal distribution

MSKEWTEST(R1, lab): Mardia’s skewness test for multivariate normality; returns a column range with the values skewness, chi-square statistic, df and p-value, plus corrected statistic and p-value for small samples

MKURTTEST(R1, lab): Mardia’s kurtosis test for multivariate normality; returns a column range with the values kurtosis, z-statistic and p-value

If lab = TRUE then an extra column of labels is appended to the results (defaults to FALSE).

Changes to the Noncentral Chi-square and F  Distribution Functions

The NCHISQ_DIST, NCHISQ_INV, NCHISQ_NCP, NF_DIST, NF_INV and NF_NCP functions have been revised to improve their accuracy especially for large values of the noncentral parameter. Note that the number of terms in the infinite series approximations has been greatly expanded (default is now 1,000 instead of 80). In addition, you can now specify the level of accuracy via a precision argument (default is .000000001).

The same changes have been made to the power and sample size functions REG_POWER, REG_SIZE, CHISQ_POWERCHISQ_SIZE, ANOVA1_POWER, and ANOVA1_SIZE.

The various functions that support the noncentral t distribution and the corresponding power and sample size  functions have not changed, except that the default number of terms in the infinite series has been increased from 80 to 120.

These changes are also reflected in the corresponding Power and Sample Size data analysis tools.

Minor Enhancements and Bug Fixes

  • Fixes a bug in the KCORREL function
  • Added some new arguments to the Mahalanobis distance squared function MDISTSq
Posted in Announcement, New Release | 9 Comments

Release 4.7

I am pleased to announce Release 4.7 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2010, 2013 and 2016 (Windows version) environments. The Excel 2007 version will be available later today.

The spreadsheets for all the examples used on the Real Statistics website, including those related to the new Release 4.7 features) are available for free download (Download Examples Workbooks). These are contained in three Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2 and Multivariate Examples. See Workbook Examples for a description of which examples are contained in which files.

In addition, there is now a new Time Series Examples file which contains the time series examples. This file is still being revised as I add new webpages to the website related to time series analysis. This process is still ongoing and I hope to complete the addition of webpages related to Release 4.6 and 4.7 shortly.

Release 4.7 contains the following new features:

Enhancement to ARIMA data analysis tool

A new Reformat for Linear Regression option has been added to the ARIMA data analysis tool which can be used to generate the standard least squares model for AR(p) processes.

In addition, the following array function has been added where R1 is an n × 1 column range containing time series data:

ARMap(R1, p): outputs an np × p+1 range which contains X and Y data equivalent to the data in R1 in order to perform multiple linear regression

Minor changes and enhancements

The WEIBULL_INV function has been revised for consistency with the standard Excel functions WEIBULL and WEIBULL.DIST. In particular, the inverse of WEIBULL(x,b,a,TRUE) or WEIBULL.DIST(x,b,a,TRUE) is now WEIBULL_INV(p,b,a). Previously, this was expressed as WEIBULL_INV(p,a,b).

Previously the VIF value for the first variable in the output of the Linear Regression data analysis tool was always one. This has now been changed to its actual VIF value. The same change has been made to the formula VIF(R1,1).

Changes in handling array arguments

I am in the process of changing most of the Real Statistics functions which take a range as an argument so that they can also take an array as an argument. This is already true of most standard Excel functions. E.g. if R1 is a range then =SUM(R1) is the sum of all the elements in R1. You can also use expressions such as =SUM(R1^2) or SUM(ABS(R1)) to get the sum of the squared values in R1 or the sum of the absolute values in R1. Note that these are array formulas and so you need to press Ctrl-Shft-Enter instead of just Enter.

A notable exception are the functions RANK and RANK.AVG. A formula of form RANK(x,R1) calculates the rank of x among the data elements in R1, but expressions like RANK(ABS(R1)) yield an error.

Many Real Statistics functions already allow for array arguments, but over time I will be expanding the number that can take array values instead of just ranges.

Bonferroni and Dunn-Sidak corrections

When using contrasts as part of a post hoc analysis, you need to revise the alpha value by dividing by the number of contrasts that you will make. This is called the Bonferroni correction. With k groups there are C(k,2) possible comparisons, and so if you expect to make all these comparisons then you need to divide alpha by C(k,2). It turns out that for theoretical reasons, if you limit yourself to orthogonal contrasts, you will need to make at most k – 1 contrasts, and so you can use a corrected alpha value of α/(k-1).

In the ANOVA data analysis tools which includes a Bonferroni correction option, the corrected version of alpha will always be based on division by k-1. If you use fewer or more than this number of contrasts you will need to change the corrected value of alpha manually.

The Dunn-Sidak correction works similarly, except that this time the corrected value of alpha is alpha will be set by default to 1–(1–α)1/m. where m = the number of contrasts used. In the ANOVA data analysis tools which includes a Dunn-Sidak correction option, the corrected version of alpha will be set by default to 1–(1–α)1/m. where m = k -1 If you use fewer or more than this number of contrasts you will need to change the corrected value of alpha manually.

Note that this approach has now been applied to the Contrasts option of the One Factor Repeated Measures ANOVA data analysis tool.

Bug Fixes

  • Corrects a bug in the SCORREL and FKTEST function for data which contains a zero value.
  • Corrects some corrupted values in the DCRIT, LCRIT and RhoCRIT lookup functions.
  • Corrects an error in the calculation of the standard error for Dunn‘s post hoc test to the Kruskal-Wallis test.
  • Corrects an overflow error in calculating the sample size for Cronbach’s alpha (in CALPHA_SIZE and in the Cronbach’s Alpha data analysis tool).

Complex Numbers

Real Statistics has added new support for complex numbers. A complex number is represented as a range with one row and two columns, where the first cell contains the real part of the complex number and the second cell contains the imaginary part. The following functions are supported, where z1 and z2 are complex numbers:

  • CReal(z1) = real part of z1
  • CImag(z1) = imaginary part of z1
  • CAbs(z1) = absolute value of z1
  • CConj(z1) = conjugate of z1
  • CAdd(z1, z2) = z1 + z2
  • CSub(z1, z2) = z1 – z2
  • CMult(z1, z2) = z1 * z2
  • CDiv(z1, z2) = z1 / z2
  • CPower(z1, n) = z1n

 Complex Matrix Operations

We represent an m × n matrix of complex numbers by a m × 2n range in Excel. This range consists of an m × n range on the left and another m × n range on the right. The first of these ranges represents the real parts of the values of the complex numbers in the complex number matrix and the second of these ranges represents the imaginary parts of the complex numbers in the complex number matrix.

This is equivalent to representing a complex matrix as A + Bi where A and B both contain only real values.

The following array functions are supported, where Z1 and Z2 are complex matrices, z1 is a complex number and n is a positive integer:

  • ZAdd(Z1, Z2) = Z1 + Z2
  • ZSub(Z1, Z2) = Z1 – Z2
  • ZMult(Z1, Z2) = Z1 × Z2
  • ZMultScalar(Z1, z1) = z1 * Z1
  • ZInverse(Z1) = Z1-1
  • ZTranspose(Z1) = Z1T
  • ZIdentity(n) = n × n complex identity matrix (represented as an n × 2n range)
Posted in Announcement, New Release | Comments Off on Release 4.7

Rel 4.6 Installation

The installation instructions for the new release, Rel 4.6, of the Real Statistics Resource Pack is as follows:

1. Make sure that Excel’s Solver capability is installed on your computer. To check this proceed as follows:
(a) open up a blank Excel spreadsheet
(b) Press Alt-TI (i.e. hold down the Alt key and press the T and I keys).
(c) On the dialog box that appears make sure that the Solver addin is checked. If not, check it and click on the OK button.
(d) If the Solver addin was not checked you need to close the Excel file before proceeding to the next step.

2. Open a blank Excel spreadsheet
3. Press Alt-TI
4. Click on the Browse button on the dialog box that appears to locate the file containing Real Statistics that you downloaded from the website
5. Make sure that the Real Statistics addin is checked and click on the OK button

Charles

Posted in Announcement, Hint | Comments Off on Rel 4.6 Installation

Important Information about Rel 4.6

Unlike previous releases of the Real Statistics Resource Pack, the latest release, Release 4.6, uses Excel’s Solver capability. This means that you must make sure that the Solver addin is loaded on your computer

 You do this by pressing Alt-TI (i.e. hold the Alt key down and press T and I) and making sure that there is a check mark next to Solver on the dialog box that appears.

Charles

Posted in Hint | Comments Off on Important Information about Rel 4.6

Release 4.6 Announcement

I am pleased to announce Release 4.6 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2010, 2013 and 2016 (Windows version) environments. It will be available tomorrow for Excel 2007 users.

The spreadsheets for all the examples used on the Real Statistics website, including those related to the new Release 4.6 features) are available for free download (Download Examples Workbooks). These are contained in three Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2 and Multivariate Examples. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website is in the process of being updated to reflect the new features. These changes will be made over the next several days. In addition, I will be adding a new examples workbook, Time Series Examples, which contains the examples for the time series capabilities.

Release 4.6 contains the following new features:

Time Series Data Analysis Tools and Functions

Three new data analysis tools have been added:

Basic Forecasting: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Exponential Trend, Holt’s Linear Trend, Holt-Winter’s model

ARIMA Model: Builds autoregressive integrated moving average (aka Box-Jenkins) time series models and produces forecasts based on these models

Time Series Tests: Calculates ACF, ACVF and PACF, as well as performing the following tests: Bartlett’s, Box-Pierce, Ljung-Box and Augmented Dickey-Fuller (ADF) test.

The following new functions have also been added:

ADFCRIT(n, alpha, type) = critical value for the Augmented Dickey-Fuller test, where n = size of the time series and type = 0 (no trend, no drift), 1 (no trend, drift), 2 (trend and drift)

ADFTEST(R1,  lab, lag, criteria, type, alpha): an array function which returns a column range for the ADF test consisting of tau-stat, tau-crit, stationary (yes/no), AIC, BIC and lags, where R1 is the column range which contains the time series, criteria = “aic”, “bic” or “none” and type is as in ADFCRIT. If criteria = “none” then lag = the lag being tested; otherwise it is the largest lag being tested. If lab = True then a column of descriptive labels is appended to the output.

ARROOTS(R1): array function which outputs one row for each of the p phi coefficients in the autoregressive process (AR) listed in range R1: real part of the root, imaginary part of the root, absolute value of the root

MAROOTS(R1): array function which outputs one row for each of the q theta coefficients in an moving average process (MA) listed in range R1: real part of the root, imaginary part of the root, absolute value of the root

ADIFF(R1, d): array function which outputs the differenced version of the time series contained in range R1 based on differencing d times.

PSICoeff(R1,R2,k): array function which returns the first k coefficients of the psi representation of the ARMA process whose phi coefficients are contained in range R1 and whose theta coefficients are contained in range R2.

ARMA_SSE(R1, R2, avg, p, q) = the SSE (residual sum of squares) value for the ARMA process with mean avg whose phi coefficients are contained in range R1 and whose theta coefficients are contained in range R2, where pth phi coefficient is set to zero and/or the qth theta coefficient is set to zero.

A big thanks to Milos Cipovic, who programmed the algorithms for the Augmented Dickey-Fuller (ADF) test.

Fisher Exact Test

The Fisher Exact Test for 2 × 2 contingency tables has been expanded to support the two-tailed test for 2 × 3, 2 × 4, 2 × 5, 2 × 6, 3 × 3 and 3 × 4 contingency tables.

The FISHERTEST function now supports these new table sizes and a Fisher Exact Test option  has been added to the Chi-square Test for Independence data analysis tool.

Because these new Fisher Exact Tests are resource intensive, limits have been placed on the sum of all the cells in the contingency tables that are supported. These limits are currently set at 2,000 for a 2 × 3 table, 1,200 for a 2 × 4 table, 440 for a 2 × 5 table , 200 for a 2 × 6 table, 280 for a 3 × 3 table and 100 for a 3 × 4 contingency table. Even close to these limits the processing can be quite slow, taking about 45 seconds on my computer.

A big thanks to Paolo Cadringher, who programmed the algorithms for these new Fisher Exact Tests.

Data Analysis Tools

A small change has been made to the dialog boxes of the Real Statistics data analysis tools. If you click on any cell in the current worksheet prior to using any of the data analysis tools, that cell is used as the default location of the output. This remains as before.

Henceforth, if you highlight more than one cell, then the resulting range is used as the default Input Range value (or if there is more than one input range, then the first input range defaults to the highlighted range). In this case, the Output Range defaults to blank, which means that the output will be written to a new worksheet.

You can override any of these defaults.

Biserial Correlation Coefficient

The new BCORREL(Rx, Ry) function can be used to compute the biserial correlation function between the data in ranges Rx and Ry where Rx s a column range consisting of 0’s and 1’s.

Roots of a Polynomial

The new ROOTS array function is used to calculate the roots of any polynomial (with real coefficients). E.g. the roots of x2 – 1 are x = 1 and x = -1, while the roots of  x2 + 1 are x = i and x = –i where i = the square root of -1 (an imaginary number).

Weighted Random Values

A new weighted random value function WRAND has been added. E.g. if the range A1:A4 contains the four weights 5,1,1,3, then WRAND(A1:A4) generates a random integer value between 1 and 4 where the probability of generating a 1 is 5/(5+1+1+3) = 50%, the probability of generating a 2 is 1/(5+1+1+3) = 10%, etc.

WRANDOM is a new array function. If, for example, you want to generate two integer random values between 1 and 4 using the above weights, then you would highlight a range consisting of two cells, enter the formula =WRANDOM(A1:A4, TRUE) and press Ctrl-Shft-Enter. If you want to make sure that the two values generated are not equal (i.e. sampling without repetition), then you would enter the formula =WRANDOM(A1:A4, FALSE).

Bug Fixes

Fixes a bug that occurs when choosing the Interaction Contrast option of the Two Factor Anova Follow Up data analysis tool

Fixes a bug which occurs when choosing the Two Independent Samples option of the T Test and Nonparametric Equivalents data analysis. This bug only occurs for the non-equal variances version of the test when a non-zero Hypothetical Mean difference value is entered.

Fixes a bug when using the Royston version of the Shapiro-Wilk test for samples of size 4.

Posted in Announcement, New Release | 2 Comments

Release 4.5 Announcement

I am pleased to announce Release 4.5 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

The spreadsheets for all the examples used on the Real Statistics website, including those related to the new Release 4.5 features) are available for free download (Download Examples Workbooks). These are contained in three Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2 and Multivariate Examples. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website is in the process of being updated to reflect the new features. These changes will be made over the next several days.

Release 4.5 contains the following new features:

Time series analysis

In this release we initiate support for time series analysis. This work will continue in the next release as well. New functions that are available in this release are:

ACF(R1, k) – autocorrelation function of order k for the data in the column range R1

ACVF(R1, k) – autocovariance function of order k for the data in the column range R1

PACF(R1, k) – partial autocorrelation function of order k for the data in column range R1

ACOV(R1, k) – autocovariance matrix for the data in column range R1

ACORR(R1, k) – autocorrelation matrix for the data in column range R1

Bland-Altman

Bland-Altman is a method for comparing two measurements of the same variable. A Bland-Altman data analysis tool is now available, which can be accessed via the Reliability menu option.

Correlation Exact Test

The Correlation Exact Test can be used in place of the Fisher transformation approach to testing the value of the correlation coefficient. The following functions have been added:

CORRDIST(r, n, rho) – correlation distribution function (cdf) for a sample of size n with sample correlation r and population correlation rho

CORRINV(alpha, n, rho) – inverse of the correlation distribution function

CORRETEST(R1, R2, rho, tails) – p-value for the correlation exact test for the data in ranges R1 and R2, with population correlation rho and tails = 1 or 2

CORRELETEST(R1, R2, rho, lab, alpha, tails) – array function similar to CORRETEST, except that a 1-alpha confidence interval is also output

Thanks to António Teixeira who suggested this topic and contributed greatly to its development.

Skewness and Kurtosis Testing

The following array functions test whether sample data comes from a distribution which has zero skewness and/or kurtosis.

SKEWTEST(R1, lab, alpha) – array function which tests whether the skewness of the sample data in range R1 is zero (consistent with a normal distribution). The output consists of sample skewness, standard error, test statistic, p-value and 1-alpha confidence interval

KURTTEST(R1, lab, alpha) – array function which tests whether the kurtosis of the sample data in range R1 is zero (consistent with a normal distribution). The output consists of sample kurtosis, standard error, test statistic, p-value and 1-alpha confidence interval

D’Agostino-Pearson Test for Normality

The  D’Agostino-Pearson test is yet another test to determine whether data comes from a population which is normally distributed. Generally, Shapiro-Wilk will give more accurate results, but this test can be useful where the sample data has a number of repetitions.

DAGOSTINO(R1) – the test statistic s2 + k2 where s is the test statistic for skewness testing and k is the test statistic for kurtosis testing (described above).

DPTEST(R1) – p-value for the D’Agostino-Pearson test.  When the data comes from a population with a normal distribution, the D’Agostino-Pearson test statistic has a chi-square distribution with 2 degrees of freedom.

Population skewness and kurtosis functions

The follow functions have been added:

SKEWP(R1) – skewness of the distribution for the population in range R1. This is equivalent to the Excel 2013/2016 function SKEW.P, and can be useful to users with prior versions of Excel.

KURTP(R1, excess) – kurtosis of the distribution for the population in range R1. If excess = TRUE then 3 is subtracted from the result (the usual approach so that a normal distribution has kurtosis of zero).

Adjusted Correlation Coefficient and Coefficient of Determination

The following function output the usual adjusted R2 value:

RSQ_ADJ(r, n) – adjusted R2 value = 1-(1-r2)(n-1)/(n-2)  where r = the sample correlation and n = the size of the sample

RSQ_ADJ(R1, R2) – adjusted R2 value for the data in ranges R1 and R2

The following function outputs a relatively unbiased value for the population correlation coefficient. Note that this value is not the square root of the adjusted R2 value described above, and provides a better estimate of the population correlation coefficient.

CORREL_ADJ(r, n) – estimated population correlation coefficient = r[1 +(1 –r2) /2(n -3)]  where r = the sample correlation and n = the size of the sample

CORREL_ADJ(R1, R2) – estimated population correlation coefficient for the data in ranges R1 and R2

Thanks to António Teixeira who suggested this topic.

Matrices raised to a Power

For any square matrix A, A0 = the identity matrix and An+1 = AnA. The following function calculates An where A is the matrix contained in the k × k range R1.

MPOWER(R1, n) = An where A consists of the data in range R1.

We use this function with Markov chains.

Bug Fixes

The results for the Scheirer Ray Hare Test data analysis tool had an error, which produced incorrect results. This has now been corrected. Thanks for Kevin Bluxome for identifying this error.

The values for the adjusted means in the ANCOVA data analysis tool were in error. This has now been corrected. Thanks to Bill G. for identifying this error.

Posted in Announcement, New Release | 2 Comments

Suggestion for Excel 2007 Users

A number of Excel 2007 users have run into various problems when installing or running Real Statistics. Here is a suggestion from Cyberpreneur which may be helpful to you

“to correct the password problem requires installing the Real Stat addin in proper manner. Copy the downloaded file to /Microsoft Office/Office12/Library/Analysis , then restart excel. Then install in normal way.”

Charles

Posted in Hint | 1 Comment

Bug Fix Release 4.4.3

There is a new release of the Real Statistics Resource Pack, Rel 4.4.3 for Excel 2007, 2010, 2013 and 2016 environments, which corrects the following bugs:

Three Factor ANOVA data analysis tool: Fixes an incorrect p-value. Thanbks to Aaron for identifying this problem

FISHERTEST function: Fixes a bug which gave an incorrect p-value in certain situations. Thanks to Hoang for identifying this problem.

I am sorry for any inconveniences that these bugs have caused.

Charles

Posted in Uncategorized | Comments Off on Bug Fix Release 4.4.3

Real Statistics Release 4.4.2

I have just issued a new bug fix release for the Real Statistics Resource Pack, Release 4.4.2.

This release fixes a bug in the Multiple Linear Regression data analysis tool when the without intercept option is chosen. The revised version changes the standard errors of the coefficients and the value of R-square (along with any of the other values that depend on these).

In addition, the AdjRSquare(R1, R2, con) function has been corrected in the case where con = FALSE (i.e. in the no intercept case).

Finally, I have also changed the Jenks Natural Breaks data analysis tool to use the term “Class” instead of “Break” or “Category”.

Only the version for Excel 2010 and 2013 has been released. I expect to issue the version for Excel 2007 shortly. The Real Statistics Examples Part 2 and Real Statistics Multivariate Examples files will also be updated shortly.

I expect that the Real Statistics software also functions with Excel 2016 (which was released a week ago), but it has not been tested yet. If you have used the Real Statistics Resource Pack with Excel 2016, I would appreciate your feedback.

Charles

Posted in Announcement, New Release | 2 Comments

Real Statistics Release 4.4

I am pleased to announce Release 4.4 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010 and 2013 environments.

The spreadsheets for all the examples used on the Real Statistics website, including those related to the new Release 4.4 features) are available for free download (Download Examples Workbooks). These are contained in three Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2 and Multivariate Examples. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website is in the process of being updated to reflect the new features. These changes will be made over the next several days.

Release 4.4 contains the following new features:

Multiple Linear Regression without Intercept

There is a new option for the Linear Regression data analysis tool which performs Multiple Linear Regression where the intercept is assumed to be zero.

To access this data analysis tool, press Ctrl-m (or choose Real Statistics Data Analysis Tools from the Add-Ins ribbon), double-click on the Regression option and then select Multiple Linear Regression. On the dialog box that appears, uncheck the Include constant (intercept) option.

There is a new array function Reg0Coeff, which is similar to the existing RegCoeff function, except that it outputs the regression coefficients and their standard errors for regression without an intercept coefficient.

The following existing functions now take a new argument con: HAT, DIAGHAT, LEVERAGEAdjRSquare, RegAIC, RegAICc, SSReg, SSRes, SSRegTot, dfRes, dfRegTot, MSReg, MSRes, MSTot, RegF, RegTest, RSquare, AdjRSquare, MultipleR, RegE, RegY. When the con argument is TRUE (default) then the regression is assumed to have an intercept, while when this argument is FALSE the regression is done assuming that the intercept is zero.

Weighted Multiple Linear Regression

There is a new data analysis tool which performs Weighted Multiple Linear Regression. This is especially useful when the homogeneous variance assumption for least squares method is not met.

To access this data analysis tool, press Ctrl-m (or choose Real Statistics Data Analysis Tools from the Add-Ins ribbon), double-click on the Regression option and then choose the Weighted Linear Regression option.

There is a new array function WRegCoeff, which is similar to the existing RegCoeff function, except that it outputs the regression coefficients and their standard errors where there are weights.

Huber-White Robust Standard Errors

The Linear Regression data analysis tool has been modified to allow you to choose a robust standard errors option. This is especially useful when the homogeneous variance assumption for least squares method is not met, and there is not enough information to use weighted linear regression.

To use this capability, press Ctrl-m (or choose Real Statistics Data Analysis Tools from the Add-Ins ribbon), double-click on the Regression option and then choose the Multiple Linear Regression option. You are presented with a choice of the following options No, HC0, HC1, HC2, HC3 and HC4.

No is the ordinary least squares option (default), which assumes that the variances are equal (called homoscedasticity). This is how we calculated the standard errors of the regression coefficients in all previous releases of the software. HC0 modifies the OLS approach in large samples to provide better estimates of the standard errors of the regression coefficients when the variances are not equal (called heteroscedasticity). The other options are used in place of HC0 with smaller samples.

There is a new array function RRegCoeff, which is similar to the existing RegCoeff function, except that it outputs the regression coefficients and their standard errors when robust standard errors are employed.

Both the Multiple Linear Regression data analysis tool and RRegCoeff function also support regression without an intercept.

Schwarz Baysean Criterion (BSC)

There is a new RegSBC function which computes the SBC for multiple linear regression. SBC is also called Schwarz Information Criterion (SIC). This function takes the form:

RegBSC(Rx, Ry, con, aug)

where Rx is a range containing the X data and Ry is a range containing the Y data. If con = TRUE (default) regression with an intercept is used and if con = FALSE regression without an intercept is used. If aug = TRUE an extra constant term n(1+LN(2π)), where n is the sample size, is added to the output (default for aug is FALSE).

The RegAIC and RegAICc functions have now been revised so that they too take con and aug arguments.

Gage R&R

There is a new Gage R&R data analysis tool which generates a Gage Repeatability and Reproducibility report that can be used to assess a measurement system using ANOVA.

To access this data analysis tool, press Ctrl-m (or choose Real Statistics Data Analysis Tools from the Add-Ins ribbon), double-click on the Analysis of Variance option and then choose Two Factor Anova which contains the Gage R&R option.

Jenks Natural Breaks

There is a new data analysis tool which performs the Jenks Natural Breaks algorithm. This is a cluster analysis method which breaks a range of values into natural categories, typically used to color maps.

To access this data analysis tool, press Ctrl-m (or choose Real Statistics Data Analysis Tools from the Add-Ins ribbon), double-click on the Multivariate Analyses option and then choose the Jenks Natural Breaks option.

Option to Disable Ctrl-m

For those of you who use the keyboard shortcut Ctrl-m for some other purpose, you can disable Crtl-m from being used as a way to display the dialog box for Real Statistics data analysis tools. In this case you will need to use choose Real Statistics Data Analysis Tools from the Add-Ins ribbon to display this dialog box.

To disable Ctrl-m, press Alt-F8 (or select View > Macros|Macros). Next insert the macro name DisableToolsShortcut in the Macro dialog box that appears and press the Run button. To enable Ctrl-m, repeat the same sequence of steps except that you need to insert EnableToolsShortcut as the macro name.

Bug Fixes

The ATEST(R1, b) function computes the p-value for one-way ANOVA where the groups are arranged in columns when b = TRUE (default) and in rows when b = FALSE. A bug has been fixed which gave the wrong p-value when b = FALSE.

When the input entries for Kaplan-Meier’s Survival Analysis which are latest in time all have a dead status, then the standard error for these entries result in division by zero. This has now been corrected.

Posted in Announcement, New Release | 2 Comments