I am proud to announce Release 5.1 of the Real Statistics Resource Pack, which is loaded with a lot of new features. The new release is now available for free download at Download Resource Pack for Excel 2013 and 2016 (Windows version) environments. The resource pack for Excel 2007 and 2010 environments will be available within the next 24 hours.

The example files Examples Workbook Part 1A, Examples Workbook Part 1B, Examples Workbook Part 2 and Multivariate Examples been updated for compatibility with Release 5.1.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 5.1.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.1.

**Poisson Regression**

A new** Poisson Regression** data analysis tool has been added that performs regression where the dependent variable contains count data.

Supporting this new tool are the following new functions: **PoissonCoeff** to calculate the regression coefficients and standard errors, **PoissonCov** to output the coefficient covariance matrix, and **PoissonPred**, **PoissonPredC** and **PoissonPredCC** to make predictions based on a Poisson regression model.

**Correspondence Analysis**

A new **Correspondence Analysis** multivariate data analysis tool has been added. Correspondence analysis plays a role similar to factor analysis or principal component analysis for categorical data expressed as a contingency table. The new tool will carry out the analysis and produce **correspondence analysis plots**.

Supporting this new tool are the following new functions: **CARowFactors **and **CAColFactors**, which return factor vectors (for the original data as well as for supplementary profiles) and **CAEigen**, which returns the eigenvalues for the correspondence analysis.

**2^k Factorial Design**

A new** 2^k Factorial Design** data analysis tool has been added to support ANOVA consisting of any number of* *factors, each of which has two levels.

Supporting this new tool are the following new functions: **Design2k **and** ExpandDesign2k**, which automatically create the coding for such designs, **Effect2k**, which calculates the effect sizes for 2^*k* factorial designs, and** SS2k**, which calculates the *SS* (sum of squares) values for these designs.

**Tukey HSD and Tukey-Kramer Tests**

The existing **Tukey HSD** and **Tukey-Kramer** options to the **ANOVA: Single Factor** data analysis have been revised. Instead of having to manually perform separate comparison tests, all possible pairwise comparisons are performed automatically. This approach will be be adopted for other ANOVA follow up tests in future releases.

**One Factor ANOVA data analysis tool**

The layout of the **ANOVA: Single Factor** dialog box has been revised to make the various options clearer and consistent with other data analysis dialog boxes. In addition, the **Dunnett-KW** test option (a Kruskal-Wallis follow-up test) has been renamed the **Steel** test. A new Kruskal-Wallis follow-up test has also been added called the **Schaich-Hamerle** test.

**New functions for t, F and chi-square distributions**

Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to the next lower integer. This can be a problem in some situations, and so we previously introduced the **F_DIST** and **CHISQ_DIST** functions which work exactly like F.DIST and CHISQ.DIST except that they don’t round off non-integer degrees of freedom, thereby improving the accuracy of some calculations.

We have now added the following functions which provide similar advantages: **T_DIST_RT**,** T_DIST_2T**,** T_INV**,** T_INV_2T**,** F_DIST_RT**,** F_INV**,** F_INV_RT**,** CHISQ_DIST_RT**,** CHISQ_INV** and **CHISQ_INV_RT**. In addition, we have enhanced the existing **T_DIST** function so that it too doesn’t round off the degrees of freedom.

**Two sample correlation tests with dependent samples**

The Real Statistics already provides the Correl2Test function to test whether two sample pairs drawn independently have significantly different correlations. We now add similar support in the case where the two sample pairs are not independent. In particular, we support two such cases.

In the first case, the two sample pairs have one variable in common. The new array functions **Correl2OverlapTTest**,** Corr2OverlapTTest**,** Correl2OverlapTest** and **Corr2OverlapTest** support this case, using two different approaches.

In the second case, there is no variable in common. This case might be employed when one pair represents one moment in time and the second pair represents the same subjects at another moment in time. The new array functions **Correl2NonOverlapTest** and **Correl2NonOverlapTest** support this case.

**Accuracy Improvements**

As mentioned above, Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to next lower integer. This is not a problem for most tests, but can give inaccurate results for some tests, and is especially a problem when the degrees of freedom is less than one.

In order to address this issue, we have replaced T.DIST.2T, F.DIST.RT, CHISQ.DIST.RT, etc. by their Real Statistics equivalents, T_DIST_2T, F_DIST_RT, CHISQ_DIST_RT, etc. for a number of Real Statistic tests (e.g. **two sample t test with unequal variance**, **Hotelling’s T-square test with unequal variance** and the **Wilk’s version of MANOVA**).

If we have not done this for some other test, please send me a comment so that we can correct this in a future release.

**Fisher Exact Test**

By default, there are limits to the size of the contingency tables supported by the **FISHERTEST** and **FISHER_TEST** functions. These limits were set since these functions can take a very long time to run with larger tables and so you may inadvertently block Excel. The limits for these functions have now been revised as follows.

Contingency tables with degrees of freedom less than 9 are supported; tables with 9 or higher degrees of freedom are currently not supported. For each supported table, there is a limit to the total cell count, i.e. the sum of all values in the table, as follows.

- 2 × 2 – no limit, 2 × 3 – 2,000, 2 × 4 – 1,250, 2 × 5 – 360
- 2 × 6 – 175, 2 × 7 – 110, 2 × 8 – 75, 2 × 9 – 40
- 3 × 3 – 320, 3 × 4 – 95, 3 × 5 – 30

If you want to exceed these limits, you can add a third argument to the FISHERTEST function which describes how much you want to increase the limit. E.g. if you want to use the Fisher exact test for a 3 × 3 contingency table in range A1:C3 the sum of whose cells is 350, then you can use the array formula =FISHERTEST(A1:C3,,1.1). The 1.1 specifies that you have increased the limit for a 3 × 3 contingency table from 320 to 320 × 1.1 = 352. Since 350 < 352, the function will run, although it will take longer.

**Enhancement for other resource intensive functions**

In addition to the Fisher exact test functions listed above, the following functions are resource intensive and are limited in terms of the size of the samples supported.

- A default limit of
*n*1*+ n*2 = 28 (sum of the two sample sizes) has been set for**MANN_EXACT**,**Perm2Dist**and**Perm2Inv**,**MannDist**and**MannInv** - A default limit of sample size
*n*= 25 has been set for**SRANK_EXACT**,**SRANKPair_EXACT**,**PermDist**and**PermInv**

In the same manner as described above for FISHERTEST, you can add an argument (i.e. the final argument) to any of the above functions to explicitly change these limits.

**Bug Fixes**

- Fixed bug in the
**GG_Epsilon**function which caused this function and the**HF_Epsilon**function to produce an error value - Fixed bug in
**F_DIST**(*x, df*1*, df*2*, cum*) when*cum*= FALSE - Fixed the formatting for the
**Mixed Repeated Measures**data analysis tool when the**Standard**formatting and**Regression**options were chosen. When more than a few independent variables were used, the analysis portion of the output tried to overwrite the descriptive statistics portion of the output. This has now been fixed. - Moved the heading of the output from the
**Three Factor ANOVA**data analysis tool one cell to the right