I am proud to announce Release 5.1 of the Real Statistics Resource Pack, which is loaded with a lot of new features. The new release is now available for free download at Download Resource Pack for Excel 2013 and 2016 (Windows version) environments. The resource pack for Excel 2007 and 2010 environments will be available within the next 24 hours.
The example files Examples Workbook Part 1A, Examples Workbook Part 1B, Examples Workbook Part 2 and Multivariate Examples been updated for compatibility with Release 5.1.
The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 5.1.
Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.
The following is a summary of the new features in Release 5.1.
A new Poisson Regression data analysis tool has been added that performs regression where the dependent variable contains count data.
Supporting this new tool are the following new functions: PoissonCoeff to calculate the regression coefficients and standard errors, PoissonCov to output the coefficient covariance matrix, and PoissonPred, PoissonPredC and PoissonPredCC to make predictions based on a Poisson regression model.
A new Correspondence Analysis multivariate data analysis tool has been added. Correspondence analysis plays a role similar to factor analysis or principal component analysis for categorical data expressed as a contingency table. The new tool will carry out the analysis and produce correspondence analysis plots.
Supporting this new tool are the following new functions: CARowFactors and CAColFactors, which return factor vectors (for the original data as well as for supplementary profiles) and CAEigen, which returns the eigenvalues for the correspondence analysis.
2^k Factorial Design
A new 2^k Factorial Design data analysis tool has been added to support ANOVA consisting of any number of factors, each of which has two levels.
Supporting this new tool are the following new functions: Design2k and ExpandDesign2k, which automatically create the coding for such designs, Effect2k, which calculates the effect sizes for 2^k factorial designs, and SS2k, which calculates the SS (sum of squares) values for these designs.
Tukey HSD and Tukey-Kramer Tests
The existing Tukey HSD and Tukey-Kramer options to the ANOVA: Single Factor data analysis have been revised. Instead of having to manually perform separate comparison tests, all possible pairwise comparisons are performed automatically. This approach will be be adopted for other ANOVA follow up tests in future releases.
One Factor ANOVA data analysis tool
The layout of the ANOVA: Single Factor dialog box has been revised to make the various options clearer and consistent with other data analysis dialog boxes. In addition, the Dunnett-KW test option (a Kruskal-Wallis follow-up test) has been renamed the Steel test. A new Kruskal-Wallis follow-up test has also been added called the Schaich-Hamerle test.
New functions for t, F and chi-square distributions
Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to the next lower integer. This can be a problem in some situations, and so we previously introduced the F_DIST and CHISQ_DIST functions which work exactly like F.DIST and CHISQ.DIST except that they don’t round off non-integer degrees of freedom, thereby improving the accuracy of some calculations.
We have now added the following functions which provide similar advantages: T_DIST_RT, T_DIST_2T, T_INV, T_INV_2T, F_DIST_RT, F_INV, F_INV_RT, CHISQ_DIST_RT, CHISQ_INV and CHISQ_INV_RT. In addition, we have enhanced the existing T_DIST function so that it too doesn’t round off the degrees of freedom.
Two sample correlation tests with dependent samples
The Real Statistics already provides the Correl2Test function to test whether two sample pairs drawn independently have significantly different correlations. We now add similar support in the case where the two sample pairs are not independent. In particular, we support two such cases.
In the first case, the two sample pairs have one variable in common. The new array functions Correl2OverlapTTest, Corr2OverlapTTest, Correl2OverlapTest and Corr2OverlapTest support this case, using two different approaches.
In the second case, there is no variable in common. This case might be employed when one pair represents one moment in time and the second pair represents the same subjects at another moment in time. The new array functions Correl2NonOverlapTest and Correl2NonOverlapTest support this case.
As mentioned above, Excel’s T.DIST, F.DIST and CHISQ.DIST functions (as well as the related functions and their Excel 2007 equivalents) round down the degrees of freedom to next lower integer. This is not a problem for most tests, but can give inaccurate results for some tests, and is especially a problem when the degrees of freedom is less than one.
In order to address this issue, we have replaced T.DIST.2T, F.DIST.RT, CHISQ.DIST.RT, etc. by their Real Statistics equivalents, T_DIST_2T, F_DIST_RT, CHISQ_DIST_RT, etc. for a number of Real Statistic tests (e.g. two sample t test with unequal variance, Hotelling’s T-square test with unequal variance and the Wilk’s version of MANOVA).
If we have not done this for some other test, please send me a comment so that we can correct this in a future release.
Fisher Exact Test
By default, there are limits to the size of the contingency tables supported by the FISHERTEST and FISHER_TEST functions. These limits were set since these functions can take a very long time to run with larger tables and so you may inadvertently block Excel. The limits for these functions have now been revised as follows.
Contingency tables with degrees of freedom less than 9 are supported; tables with 9 or higher degrees of freedom are currently not supported. For each supported table, there is a limit to the total cell count, i.e. the sum of all values in the table, as follows.
- 2 × 2 – no limit, 2 × 3 – 2,000, 2 × 4 – 1,250, 2 × 5 – 360
- 2 × 6 – 175, 2 × 7 – 110, 2 × 8 – 75, 2 × 9 – 40
- 3 × 3 – 320, 3 × 4 – 95, 3 × 5 – 30
If you want to exceed these limits, you can add a third argument to the FISHERTEST function which describes how much you want to increase the limit. E.g. if you want to use the Fisher exact test for a 3 × 3 contingency table in range A1:C3 the sum of whose cells is 350, then you can use the array formula =FISHERTEST(A1:C3,,1.1). The 1.1 specifies that you have increased the limit for a 3 × 3 contingency table from 320 to 320 × 1.1 = 352. Since 350 < 352, the function will run, although it will take longer.
Enhancement for other resource intensive functions
In addition to the Fisher exact test functions listed above, the following functions are resource intensive and are limited in terms of the size of the samples supported.
- A default limit of n1 + n2 = 28 (sum of the two sample sizes) has been set for MANN_EXACT, Perm2Dist and Perm2Inv, MannDist and MannInv
- A default limit of sample size n = 25 has been set for SRANK_EXACT, SRANKPair_EXACT, PermDist and PermInv
In the same manner as described above for FISHERTEST, you can add an argument (i.e. the final argument) to any of the above functions to explicitly change these limits.
- Fixed bug in the GG_Epsilon function which caused this function and the HF_Epsilon function to produce an error value
- Fixed bug in F_DIST(x, df1, df2, cum) when cum = FALSE
- Fixed the formatting for the Mixed Repeated Measures data analysis tool when the Standard formatting and Regression options were chosen. When more than a few independent variables were used, the analysis portion of the output tried to overwrite the descriptive statistics portion of the output. This has now been fixed.
- Moved the heading of the output from the Three Factor ANOVA data analysis tool one cell to the right