- Improves the accuracy of Mauchly’s test for sphericity, i.e. the
**MauchlyTest**function - Corrects a bug in the
parameter of the regression approach to fitting data to a Weibull distribution, i.e.*benard***WEIBULL_FITR**. When the*benard*parameter is set to TRUE (default), the Benard approximation is used. Previously*benard*= FALSE meant that the Benard approximation was used. This is now corrected.

Charles

]]>This error only occurred for certain combinations of parameters, but it is advised that you download the new version of the Real Statistics software (Rel 5.4.1) if you plan to use any of the above capabilities.

I apologize again for any inconvenience.

Charles

]]>I am pleased to announce Release 5.4 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2011, 2013 and 2016 (Windows and Mac version) environments.

The various examples workbooks have also been updated for compatibility with the new release. Please note that the Workbook Examples 2 has now been divided into two files Workbook Examples 2A and Workbook Examples 2B. You can download these workbooks at Examples Workbooks.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.4.

**Forecast Accuracy**

A new **Forecast Accuracy** data analysis tool has been added which provides the following new capabilities:

**Error statistics**: Displays the following statistics to measure the size of the error in a forecast: MSE, MAE, RMSE, ME, MPE, MAPE, SMAPE, U1, U2**Diebold-Mariano**test to determine whether there is a significant difference in the accuracy of two forecasts.**Harvey, Leybourne and Newbold**(**HLN**) test which is a refinement of the DM test for small samples**Pesaran-Timmermann**test which determines whether a forecast is an accurate predictor of the sign of a time series (thanks to Ross)

Supporting these tests are the following new worksheet functions: **ForecastError, Forecast_Error, DIEBOLD, DMTEST, HLN, HLNTEST, PESARAN, PTTEST, LossDiff**

**Enhancements to the Noncentral t Distribution**

Improved the accuracy and robustness of the Real Statistics implementation of the noncentral t distribution (thanks to Antonio). This entails enhancements to the following worksheet functions: **NT_DIST, NT_INV, NT_NCP, T1_POWER, T2_POWER, T1_SIZE, T2_SIZE**

These changes are also implemented in the **Statistical Power and Sample Size** data analysis tool.

**Weibull with Censored Data**

Enhanced **WEIBULL_FIT** to allow fitting a Weibull distribution to data that not only includes data on failures, but also components that have not yet failed at the end of the allotted time (i.e. censored data).

In addition to the changes to WEIBULL_FIT, the following new worksheet functions have been added:

**WEIBULL_CMEAN** – calculates the mean of data which includes censored data that follows a Weibull distribution

**WEIBULL_CVAR** – calculates the variance of data which includes censored data that follows a Weibull distribution

**Other Weibull Enhancements**

The **WEIBULL_FITR** function, which fits a Weibull distribution to some data using regression, now supports **Benard’s approximation**. This is now the default, although the previous approach is also supported.

The following new function has also been added:

**WEIBULL_MRL** – calculates the **mean residual life** of a component that follows a Weibull distribution which has not yet failed; i.e. the expected MTTF after the component has been activated for some time *t*_{0}.

**Prediction/Confidence Intervals**

A new **Prediction/Confidence Interval Plot** data analysis tool has been added. This data analysis tool plots the prediction and confidence intervals related to a regression model.

**Distribution Fitting**

A new **Distribution Fitting** data analysis tool has been added. This tool can be used to estimate distribution parameters for the normal, Weibull, beta, gamma, uniform and exponential distributions.

**Correlogram **

A new **Correlogram** data analysis tool has been added that plots correlograms for the ACF and PACF of time series along with their **confidence intervals **(thanks to Sohrab).

**Interpolation**

Until this release the default interpolation used for statistical table lookup was harmonic interpolation. This turned out to be a mistake since in many cases linear interpolation is more accurate. But log interpolation turns out to be better still, especially when interpolating between alpha values.

As a result, there are now three interpolation options for the **Interpolate** function, namely linear (*h* = 0), log (*h* = 1) and harmonic (*h* = 2), where log is now the default. Note that previously the last argument took the values *h* = TRUE (default) for harmonic and *h* = FALSE for linear.

The **ILookup** function has also been revised. Its final argument also took a Boolean value as for Interpolate. This final argument has now been replaced by two final arguments* hc* and *hr*, where both *hc* and *hr *take the same new values as for Interpolate. *hc* specifies the interpolation used for columns and *hr* specifies the interpolation used for rows of a statistical table. This allows, for example, *alpha* values (in the columns) to use log interpolation, while *df* values (in the rows) to use harmonic interpolation.

For each statistical table, we offer the user two options for interpolation: *interp* = FALSE for linear interpolation and *interp* = TRUE for the recommended interpolation, which may include some combination of the three supported types of interpolations. This will be clearly explained on the website.

As a result, the following worksheet functions have been revised to support the improved approach to interpolation: **MCRIT, MPROB, WCRIT, WPROB, DCRIT, DPROB, KSCRIT, KSPROB, KS2CRIT, KS2PROB, SRankCRIT, SRankPROB, ADCRIT, ADPROB, SWTEST, SWPROB, DUpperCRIT, DLowerCRIT, TauCRIT, RhoCRIT, QCRIT, SR_CONF, MANN_CONF**

These changes will also be reflected in all the data analysis tools that rely on table lookup.

**Other New Worksheet Functions**

**EXPON_INV** – calculates the inverse of the cdf for the exponential distribution

**EXPON_FIT** – estimates the lambda parameter for the exponential distribution that best fits a data set.

**XGAMMA** – calculates the gamma function even for negative values (thanks to Antonio)

**UpperGamma** – calculates the value of the incomplete upper gamma function

**LowerGamma** – calculates the value of the incomplete lower gamma function

**ARIMA**

ARIMA support has now been added to the Mac version of the software.

Improved the accuracy of the standard errors of the ARIMA coefficients (thanks to Miloš).

**Other Enhancements**

- Improved the speed of the loading of the main menu (i.e. the dialog box that appears when you press
**Ctrl-m**) - Reformatted many of the
**Help**dialog boxes which appear when you press the**Help**button on the various data analysis tools dialog boxes (Windows versions only; the Mac versions will be reformatted in a future release). - Added some more worksheet functions to the
**Insert Function**capability (Excel 2010 and 2013/2016 Windows version only)*f*_{x} - Additional error checking has been added to some of the data analysis tools

**Bug fixes**

- Fixed an error in the
**RidgeRSQ**function that caused a relatively small error in the result - Fixed an error in the
**ADTEST**function (i.e. Anderson-Darling test) for the normal distribution in the case where the AD statistic is between .34 and .60 - Fixed an error in the
**Resampling**data analysis tool for the independent, paired samples and correlation options

This error is not present in the version of Real Statistics for Excel 2002 and 2003 users, nor for the old Rel 3.5.3 version for the Mac. I suggest that everyone else upgrade to Release 5.3.2.

I apologize for any inconvenience that I have caused you.

Charles

]]>It contains some bug fixes for the Ridge regression data analysis tool and functions. If you have already upgraded to Rel 5.3 and don’t have any need for Ridge regression, then you don’t need to install this release.

This release is now available for Excel 2011, 2013 and 2016 (Mac and Windows) users. It will shortly be made available to Excel 2007 and 2010 users. The Examples Workbook Part 2 will also shortly be revised to support the changes in this release.

In addition to the bug fixes, the following new functions are available:

**RidgeCoeff**: outputs the unstandardized Ridge regression coefficients

**RidgeLambda**: outputs an estimated lambda value based on making sure that the VIF values are below some user-defined threshold

**RidgeMSE**: outputs the MSE value for a Ridge regression model

**RidgePred**: outputs the predicted y values corresponding to and array of *x* data based on a Ridge regression model

In addition to these Ridge regression functions, the release adds the following function:

**RegPredCC**: outputs the predicted y values for any regression model given the *x* values and regression coefficients. This is similar to the pre-existing RegPredC function, except that now an array of *x* values can be specified.

Charles

]]>The new release is actually called 5.3.1 since it has some bug fixes for Rel 5.3 (plus a couple of new features). This release will also be available to Windows users shortly.

Charles

]]>The various examples workbooks have also been updated for capability with the new release. The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.3.

**Ridge Regression**

A new** Ridge Regression** data analysis tool has been added that performs Ridge regression, which is especially useful to handle multicollinearity.

Supporting this new tool are the following new functions:

**RidgeRegCoeff**: calculates the Ridge regression coefficients and standard errors.

**RidgeRSQ**: calculates the R-square value for Ridge regression.

**RidgeVIF**: calculates the VIF values for the independent variables.

**RidgeCVError**: calculates the Ridge regression k-fold cross-validation error for a particular value of lambda; used to estimate a desirable lambda value.

**LASSO Regression**

A new **LASSORegCoeff** function has been added to estimate the **LASSO** (least absolute selection and shrinkage operator)** **regression coefficients using a cyclical coordinate descent algorithm.

**Standardized Regression coefficients**

The following functions have been added:

**STDCOL**: takes an array or cell range and outputs an array that has the same dimensions but with a standardization of the values in each column.

**StdRegCoeff**: outputs the regression coefficients that corresponds to the standardization of the *x* and y input data.

**UnstdRegCoeff**: does the reverse of StdRegCoeff by outputting the unstandardized regression coefficients when the standardized regression coefficients are known.

These functions are used in performing Ridge regression.

**Multiple Regression Solver option**

The algorithm that performs multiple linear regression calculates (*X ^{T}X*)

Sometimes (*X ^{T}X*)

A new **Use Solver** option has now been added to the Multiple Linear Regression data analysis tool to handle such situations.

**Cochran-Mantel-Haenszel Test**

A **Cochran-Mantel-Haenszel Test** data analysis tool has been added. This test determines whether the odds ratios of a series of 2 × 2 contingency tables are significantly different from one. The data analysis tool also includes **Woolf’s Heterogeneity test** which determines whether the odds ratios are significantly different.

The analysis tool uses the following new array functions: **CMHTest** and **WoolfTest**.

**Sphericity Tests**

The following two tests for sphericity have been added:

**MauchlyTest**(R1) = p-value of **Mauchly’s test** for sphericity on the data in range R1

**JNSTest**(R1) = p-value of the **John-Nagao-Sugiura** test for sphericity on the data in range R1

**Partitions**

The following functions partitions the numbers 1 through *n* into *k* approximately equal-sized groups.

**RandPart**(*n, k*): random partition

**OrderedPart**(*n, k*): ordered partition

**SortedPart**(*n, k, *R1): ordered partition based on the sort order in the column range R1 with n

E.g. OrderedPart(10,3) outputs a column array with 10 rows containing the values 1, 2, 3, 1, 2, 3, 1, 2, 3, 1 (in that order). RandPart(10,3) outputs a column array with values such as 2, 1, 1, 3, 2, 1, 2, 3, 1, 3 (the values 2 and 3 are repeated 3 times and 1 is repeated 4 times). If R1 is a column range with the values 1.1, -1.4, 2.5, 3.6, 0.5, then SortedPart(R1,3) outputs a column range with the values 3, 1, 1, 2, 2 (in that order).

**Chi-square Independence Test enhancement**

The **Chi-square Independence Test** data analysis tool supports two input formats: Excel format (in the form of a contingency table) and **Standard** format. The Standard format is a two column range specifying pairs of headings for the contingency table. Thus if this range contains say 10 rows then the sum of all the cells in the contingency table would be 10.

A new version of the standard format is now also supported. It consists of three columns, the first two columns are as in the previous version, while the third column contains non-negative integer values, specifying how many times the pairs in the first two columns are to be repeated. The total cell count in the contingency table now equals the sum of the values in the third column. Also, the two column version of the standard format is equivalent to the three column version where the third column contains all ones.

**Descriptive Statistics and Normality enhancement**

When the **Shapiro-Wilk** option is chosen from the **Descriptive Statistics and Normality** data analysis tool, in addition to the Shapiro-Wilk test, the results of the **d’Agostino-Pearson test** for normality are also displayed.

**Sort rows enhancements**

- Changed the
**QSORTRows**and**QSORT2Rows**functions so that they properly sort rows with an empty cell in the column with the sort key(s). - These functions as well as
**QSORT2RowsMixed now**retain the original order in case of ties. - All three functions now take an optional last argument that ensures that a header row is not sorted but remains in the first row.

**Other enhancements**

**BOXCOX**(R1) and**BOXCOXLambda**(R1) now work properly even when range R1 contains non-positive values (solves an issue for Luciano).- Additional error checking has been added to some of the data analysis tools.
- The
**Correlation**and**Multivariate**function categories have been added and many more functions are now supported by the Paste Function () button on the Formula bar (in the Excel 2010, 2013 and 2016 (Windows) versions of the software.*f*_{x} - If the
**Input Range Y**field in the dialog box for the**Multiple Linear Regression**data analysis tool is not filled in, then the last column in the**Input Range X**field is used as the y values. - The
**Regression**option of the**Two Factor ANOVA**data analysis tool now supports models without replications.

**Bug Fixes**

- Fixed
**TiesCorrection**in the one-sample case (thanks to Uwe for identifying this error) - Fixed
**KSCRIT**and**KSPROB**(thanks to Daniel for identifying this error) - Fixed
**F_DIST**in the pdf case (thanks to Antonio for identifying this error) - Fix error in the
**Basic Forecasting**data analysis tool when output is displayed on a new worksheet - Fixed a bug in the
**CorrTest**function - The
**Reformat for Linear Regression**option of the**ARIMA Model and Forecast**data analysis tool now uses the correct input data

The example files Examples Workbook Part 2 and Time Series Examples been updated for compatibility with Release 5.2.

The Real Statistics website will be updated over the course of the next few days to reflect the new capabilities in Release 5.2.

The following is a summary of the new features in Release 5.2.

**ARIMA Enhancements**

A new **ARIMA_Coeff** array function has been added which calculates the coefficients of an Arima(p, q, d) model, along with the standard errors of these coefficients and confidence intervals.

In addition, the **ARIMA_Stats** array function has been added which calculates various statistics (*LL, SSE, MSE, AIC*, etc.) for an Arima model.

Please join me in thanking **Miloš Cipovic**, who did a beautiful job of programming the algorithms for these new functions using the Levenberg-Marquardt method.

**Minor changes**

Used **F_DIST_RT** function in the **Repeated Measures ANOVA** data analysis tool to increase accuracy.

Corrected some tooltips on some dialog boxes

Added **Nonparametric** and **TimeSeries** function categories to make it easier to get information about more Real Statistics functions via the **Insert Function** ** f_{x}** capability.

The **COCHRAN** and **QTEST** formulas have been enhanced. A third argument has been added which allows you to specify that a continuity correction will be used in the case where there are only two variables (this is also the default), i.e. the case which is equal to McNemar’s test.

**Follow-up Analyses after Three Factor ANOVA**

A new **ANOVA Follow-up** data analysis tool has been added. This tool allows the user to perform Contrast and Tukey HSD (or Tukey-Kramer) analyses after Three Factor ANOVA.

This tool can be used with both balanced and unbalanced models and can actually be employed after any type of ANOVA (not just Three Factor ANOVA) provided you have the appropriate descriptive data and the values of *MS _{E}* and

**Quick Update**

Release 5.1 is now available for users of Excel 2007, 2010, 2013 and 2016 (Windows). All the examples files have been updated and good progress has been made updating the website for compatibility with Rel 5.1. In particular, the website now includes webpages on the new **ANOVA Follow-up** data analysis tool as well as **2^k Factorial Design** and **Correspondence Analysis**.

I had originally planned to improve the ARIMA support in Rel 5.1, but was unable to get it tested in time. This feature will be included in the next release.

]]>The example files Examples Workbook Part 1A, Examples Workbook Part 1B, Examples Workbook Part 2 and Multivariate Examples been updated for compatibility with Release 5.1.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 5.1.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.1.

**Poisson Regression**

A new** Poisson Regression** data analysis tool has been added that performs regression where the dependent variable contains count data.

Supporting this new tool are the following new functions: **PoissonCoeff** to calculate the regression coefficients and standard errors, **PoissonCov** to output the coefficient covariance matrix, and **PoissonPred**, **PoissonPredC** and **PoissonPredCC** to make predictions based on a Poisson regression model.

**Correspondence Analysis**

A new **Correspondence Analysis** multivariate data analysis tool has been added. Correspondence analysis plays a role similar to factor analysis or principal component analysis for categorical data expressed as a contingency table. The new tool will carry out the analysis and produce **correspondence analysis plots**.

Supporting this new tool are the following new functions: **CARowFactors **and **CAColFactors**, which return factor vectors (for the original data as well as for supplementary profiles) and **CAEigen**, which returns the eigenvalues for the correspondence analysis.

**2^k Factorial Design**

A new** 2^k Factorial Design** data analysis tool has been added to support ANOVA consisting of any number of* *factors, each of which has two levels.

Supporting this new tool are the following new functions: **Design2k **and** ExpandDesign2k**, which automatically create the coding for such designs, **Effect2k**, which calculates the effect sizes for 2^*k* factorial designs, and** SS2k**, which calculates the *SS* (sum of squares) values for these designs.

**Tukey HSD and Tukey-Kramer Tests**

The existing **Tukey HSD** and **Tukey-Kramer** options to the **ANOVA: Single Factor** data analysis have been revised. Instead of having to manually perform separate comparison tests, all possible pairwise comparisons are performed automatically. This approach will be be adopted for other ANOVA follow up tests in future releases.

**One Factor ANOVA data analysis tool**

The layout of the **ANOVA: Single Factor** dialog box has been revised to make the various options clearer and consistent with other data analysis dialog boxes. In addition, the **Dunnett-KW** test option (a Kruskal-Wallis follow-up test) has been renamed the **Steel** test. A new Kruskal-Wallis follow-up test has also been added called the **Schaich-Hamerle** test.

**New functions for t, F and chi-square distributions**

Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to the next lower integer. This can be a problem in some situations, and so we previously introduced the **F_DIST** and **CHISQ_DIST** functions which work exactly like F.DIST and CHISQ.DIST except that they don’t round off non-integer degrees of freedom, thereby improving the accuracy of some calculations.

We have now added the following functions which provide similar advantages: **T_DIST_RT**,** T_DIST_2T**,** T_INV**,** T_INV_2T**,** F_DIST_RT**,** F_INV**,** F_INV_RT**,** CHISQ_DIST_RT**,** CHISQ_INV** and **CHISQ_INV_RT**. In addition, we have enhanced the existing **T_DIST** function so that it too doesn’t round off the degrees of freedom.

**Two sample correlation tests with dependent samples**

The Real Statistics already provides the Correl2Test function to test whether two sample pairs drawn independently have significantly different correlations. We now add similar support in the case where the two sample pairs are not independent. In particular, we support two such cases.

In the first case, the two sample pairs have one variable in common. The new array functions **Correl2OverlapTTest**,** Corr2OverlapTTest**,** Correl2OverlapTest** and **Corr2OverlapTest** support this case, using two different approaches.

In the second case, there is no variable in common. This case might be employed when one pair represents one moment in time and the second pair represents the same subjects at another moment in time. The new array functions **Correl2NonOverlapTest** and **Correl2NonOverlapTest** support this case.

**Accuracy Improvements**

As mentioned above, Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to next lower integer. This is not a problem for most tests, but can give inaccurate results for some tests, and is especially a problem when the degrees of freedom is less than one.

In order to address this issue, we have replaced T.DIST.2T, F.DIST.RT, CHISQ.DIST.RT, etc. by their Real Statistics equivalents, T_DIST_2T, F_DIST_RT, CHISQ_DIST_RT, etc. for a number of Real Statistic tests (e.g. **two sample t test with unequal variance**, **Hotelling’s T-square test with unequal variance** and the **Wilk’s version of MANOVA**).

If we have not done this for some other test, please send me a comment so that we can correct this in a future release.

**Fisher Exact Test**

By default, there are limits to the size of the contingency tables supported by the **FISHERTEST** and **FISHER_TEST** functions. These limits were set since these functions can take a very long time to run with larger tables and so you may inadvertently block Excel. The limits for these functions have now been revised as follows.

Contingency tables with degrees of freedom less than 9 are supported; tables with 9 or higher degrees of freedom are currently not supported. For each supported table, there is a limit to the total cell count, i.e. the sum of all values in the table, as follows.

- 2 × 2 – no limit, 2 × 3 – 2,000, 2 × 4 – 1,250, 2 × 5 – 360
- 2 × 6 – 175, 2 × 7 – 110, 2 × 8 – 75, 2 × 9 – 40
- 3 × 3 – 320, 3 × 4 – 95, 3 × 5 – 30

If you want to exceed these limits, you can add a third argument to the FISHERTEST function which describes how much you want to increase the limit. E.g. if you want to use the Fisher exact test for a 3 × 3 contingency table in range A1:C3 the sum of whose cells is 350, then you can use the array formula =FISHERTEST(A1:C3,,1.1). The 1.1 specifies that you have increased the limit for a 3 × 3 contingency table from 320 to 320 × 1.1 = 352. Since 350 < 352, the function will run, although it will take longer.

**Enhancement for other resource intensive functions**

In addition to the Fisher exact test functions listed above, the following functions are resource intensive and are limited in terms of the size of the samples supported.

- A default limit of
*n*1*+ n*2 = 28 (sum of the two sample sizes) has been set for**MANN_EXACT**,**Perm2Dist**and**Perm2Inv**,**MannDist**and**MannInv** - A default limit of sample size
*n*= 25 has been set for**SRANK_EXACT**,**SRANKPair_EXACT**,**PermDist**and**PermInv**

In the same manner as described above for FISHERTEST, you can add an argument (i.e. the final argument) to any of the above functions to explicitly change these limits.

**Bug Fixes**

- Fixed bug in the
**GG_Epsilon**function which caused this function and the**HF_Epsilon**function to produce an error value - Fixed bug in
**F_DIST**(*x, df*1*, df*2*, cum*) when*cum*= FALSE - Fixed the formatting for the
**Mixed Repeated Measures**data analysis tool when the**Standard**formatting and**Regression**options were chosen. When more than a few independent variables were used, the analysis portion of the output tried to overwrite the descriptive statistics portion of the output. This has now been fixed. - Moved the heading of the output from the
**Three Factor ANOVA**data analysis tool one cell to the right