Release 5.0 for Mac

New Mac Release

Good news for Mac users. The latest release of the Real Statistics Resource Pack (Release 5.0) is now available for use with Excel 2016 for the Mac. You can now download this release at Real Statistics Resource Pack for Mac.

There is also a new release of the Real Statistics Resource Pack that is compatible with Excel 2011 (Mac), but this has not been tested. You can download this version from Real Statistics Resource Pack for Mac as well. I would appreciate your informing me whether this release works on your Mac computer.

I have put in considerable energy and investment to create this new release and test it on a Mac computer. Any donation from you would be appreciated to help offset my costs.

User Interface

The user interface for the Mac version of the Real Statistics data analysis tools is identical to the Windows version with one important difference. When inserting a range into an input field, in Windows you can simply highlight the range of cells and the field will automatically contain the address of the highlighted range (see Figure 1).

Insert range Windows

Figure 1 – Inserting a Range in Windows

In the Mac version of Real Statistics to accomplish the same thing, you need to click on the + button, as shown in Figure 2.

Insert range on Mac

Figure 2 – Inserting a Range on the Mac (step 1)

This will display the dialog box shown on the right side of Figure 3. You can now highlight the desired data range (M3:M7 in this example) and when you press the OK button, the Input Range field will automatically be filled with the appropriate cell range address.

Insert range dialog Mac

Figure 3 – Inserting a Range in the Mac (step 2)

The situation is the same for the Output Range, except that now you should only highlight one cell.

Posted in Announcement, New Release | Comments Off on Release 5.0 for Mac

Release 5.0

I am pleased to announce Release 5.0 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

I am still working on Release 5.0 for the Mac, and I expect this to be available in June.

The Examples Workbook Part 1 has now been split into two files: Examples Workbook Part 1A and Examples Workbook Part 1B. These files as well as Examples Workbook Part 2 have been updated for compatibility with Release 5.0. The reliability examples, except for the ICC examples, can now be found in Workbook Part 1B and not Workbook Part 2.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 5.0.

My apologies to all of you who have been waiting for the Real Statistic book. The revised timeframe for Real Statistics using Excel – Fundamentals is now September 2017.

Also thanks to all of you who have given donations to help sustain the Real Statistics project. This is most appreciated as are the countless number of people who have identified errors and who have made suggestions to improve the software and website.

The following is a summary of the new features in Release 5.0.

Krippendorff’s Alpha

Support for Krippendorff’s Alpha, another approach to inter-rater reliability, has been added. This approach has the advantage that it supports categorical, ordinal, interval and ratio type data and also handles missing data.

New functions have been added (KALPHA, KTRANS, KRIP_SES, KRIP_SER, KRIP) to support Krippendorff’s Alpha as well as a new data analysis tool.

Gwet’s AC2

Support for Gwet’s AC2 has also been added.  Gwet’s AC2 is yet another approach to inter-rater reliability which is similar to Krippendorff’s Alpha

New functions have been added (GWET_AC2, GWET_SES, GWET_SER, GTRANS, GWET) to support Gwet’s AC2, as well as a new data analysis tool.

Reliability data analysis tools

The Reliability data analysis tool has been replaced by the following three data analysis tools:

  • Internal Consistency Reliability: Cronbach’s Alpha and Split Half / Guttman’s
  • Interrater Reliability: Cohen’s Kappa, Weighted Kappa, Kendall’s W, Bland-Altman, Intraclass Correlation, Krippendorff’s Alpha and Gwet’s AC2
  • Item Analysis: Discrimination Index, Difficulty Index, Point Biserial Correlation

Distribution Fitting Capabilities

The goal of these new capabilities is to determine how to fit various distributions to sample data. In particular, new functions have been added to estimate the parameters of these distributions using the method of moments (WEIBULL_FITM, GAMMA_FITM, BETA_FITM, UNIFORM_FITM), maximum likelihood (WEIBULL_FIT, GAMMA_FIT, BETA_FIT, UNIFORM_FIT) and regression (WEIBULL_FITR).

Anderson-Darling Test

The Anderson-Darling Test is a way of determining whether a specified distribution is a fit for a given sample. This test is now provided for the following distributions: normal, exponential, Weibull, gamma and generic (i.e. any distribution with no unknown parameters).

New functions have been added (ANDERSON, ADTEST, ADCRIT, ADPROB) to support the Anderson-Darling Test, as well as a new data analysis tool.

Chi-square Goodness of Fit Test

New distribution-specific capabilities have been added to complement the existing FIT_TEST function. The following distributions are initially supported: normal, exponential, Weibull, gamma, beta and uniform. The new GOFTESTExact function can be used when the distribution parameters are known and the new GOFTEST function can be used when the distribution parameters are not known. In addition these tests can be performed via a new data analysis tool.

Non-parametric data analysis tools

The Non-parametric data analysis tool has been split into the following two data analysis tools:

  • Non-parametric Tests: Friedman’s Test, Runs Tests, Cochran’s Q Test, Moods’ Test
  • Goodness of Fit Tests: Two Sample Kolmogorov-Smirnov Test, One Sample Anderson-Darling Test, Chi-square Goodness of Fit Test

Changes to the User Interface

Upon pressing Ctrl-m (or an equivalent) you have access to the various data analysis tools via the original interface or the newer MultiPage interface. A new Corr tab has been added to the MultiPage interface that provides access to the following data analysis tools: Correlation Tests, Polychoric Correlation as well as the three reliability data analysis tools described above.

A new Reliability option has been added to the original interface, which gives access to the three reliability data analysis tools described above. Also a Goodness of Fit option has been added.

Improved Box Plots

The existing Box Plot and Box Plot with Outliers data analysis capabilities have been revised to better handle negative data elements. In such cases, you should refer to the labels for the y axis shown on the right side of the chart. Big thanks to Bob who explained how to make this improvement!

In addition, the Box Plot now shows the mean for each group (via an × on the chart)

Statistical Tables

A Two Sample Kolmogorov-Smirnov table of critical values has been added as well as One Sample Anderson-Darling tables of critical values.

The One Sample Kolmogorov-Smirnov table of critical values has also been revised. This also improves the accuracy of the KSCRIT and KSPROB functions. Errors in the KSCRIT, KSPROB and KINV functions have also been fixed.

Two Sample Kolmogorov-Smirnov Test

A new KS2CRIT function has been added which automatically performs the lookup of values in the Two Sample Kolmogorov-Smirnov table of critical values. In addition, the new KS2PROB function estimates the p-value for the Two Sample KS test based on interpolation between values in the Two Sample Kolmogorov-Smirnov table of critical values.

Polygamma Function

The POLYGAMMA worksheet function has been added to calculate the digamma and trigamma functions.

Bug Fixes

  • Fixed the LCRIT and LPROB functions for n > 50
  • Fixed the LogitSelect function, which did not work properly
  • Fixed the Three Factor ANOVA using Regression (totals were not calculated)
  • Fixed the VAR_POWER function (roles of the two parameters were reversed)
  • Fixed Chi-square Independence Test data analysis tool when the standard format was used without headings
Posted in Announcement, New Release | Comments Off on Release 5.0

Release 4.14

I am pleased to announce Release 4.14 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

Note that now there are three versions of the software: one for Excel 2013/2016, another for Excel 2010 and a third for Excel 2007.

A new version for the Mac will be available within the next few weeks.

The Examples Workbook Parts 1 and 2 and the Multivariate Examples files have been updated for compatibility with the new release.

The Real Statistics website will be updated over the course of the next few days to reflect the new capabilities in Release 4.14.

Discriminant Analysis

A new Discriminant Analysis data analysis tool has been added to the multivariate analysis part of the Real Statistics Resource Pack. The tool will perform both linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA).

Classification tables and predictions for training and non-training data can be made.

Polychoric Correlation

A new Polychoric Correlation data analysis tool has been added which calculates the polychoric correlation for two discrete, finite, ordinal variables (using Solver). Data for such variables can be organized into a two dimensional contingency table (as for the chi-square test of independence). When the two variables are dichotomous, the polychoric correlation is called a tetrachoric correlation.

In addition, the TCORREL function has been added which estimates the tetrachoric correlation coefficient, along with a p-value and confidence interval

Percentile and Quartile Functions

New QUARTILE_EXC(R1, k) and PERCENTILE_EXC(R1, p) functions have been added which have the same functionality as QUARTILE.EXC and PERCENTILE.EXC. This is useful for Excel 2007 users since this version of Excel doesn’t support the .EXC functions. These functions do differ from their Excel counterparts only in the extreme cases: where the Excel function return #NUM!, the Real Statistics functions return MIN(R1) and MAX(R1).

In addition, these functions take a final argument which provides other options for defining percentile and quartile based on Hyndman-Fan methods 4, 5, 6, 7, 8 and 9 (note that Excel’s .EXC approach is method 6 and .INC is method 7). QUARTILE_EXC(R1, k, m) and PERCENTILE_EXC(R1, p, m) default to method 6 when m is omitted.

Fisher test effect size

The FISHER_TEST function has been added, which not only reports the p-value for the Fisher exact test (as is done by FISHERTEST), but also estimates equivalent phi and Cramer V effect sizes.

Bug Fixes

  • Fixes bug in Cluster Analysis data analysis tool when writing to another webpage
  • Fixes bug in SVD functions
Posted in Announcement, New Release | 2 Comments

Update for Mac Users

Good news for Mac users.

I know that many Mac users have been eager to get the latest Real Statistics features for Excel on their Mac. Until now, I have been unable to do this since I don’t own a Mac and have been unable to borrow one for long enough to create and test a new version for the Mac.

I have now decided that the only solution is to buy a Mac. Until about 20 years ago I only used a Mac, but in the past 20 years I have been using Windows. After the new Mac arrives and I get acclimated to it, I will create the new Real Statistics release for Mac.

In order to offset the costs of the Mac, I ask any of you Mac users to make a small donation to offset my costs. This is completely optional, but will be appreciated. You can make the donation by clicking here on the word Donation.

Thanks for your continued support.

Charles

Posted in Announcement | Comments Off on Update for Mac Users

Release 4.13

I am pleased to announce Release 4.13 of the Real Statistics Resource Pack. The new release is now available for free download at Download Resource Pack for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 and the Multivariate Examples files have been updated for compatibility with the new release.

The Real Statistics website will be updated over the course of the next few days to reflect the new capabilities in Release 4.13.

Probit Logistic Regression support

The logistic regression data analysis tool has been expanded to add an option for probit regression. This data analysis tool is now called Logistic and Probit Regression data analysis tool.

In addition, the following probit regression functions have been added, which operate like their logistic regression counterparts:

ProbitCoeff, ProbitCoeff2, ProbitCoeffs, ProbitRSquare, ProbitTest, ProbitPred, ProbitPredC

Mann-Whitney and Signed-Ranks Confidence Intervals

The following two new functions are supported which output a confidence interval for the Hodges-Lehmann median based on the Mann-Whitney and Signed-ranks tests:

MANN_CONF(R1, R2, lab, ttype, alpha): returns a column array with the following values based on the Mann-Whitney test conducted on the samples in ranges R1 and R2: lower and upper limits of the 1-alpha confidence interval using U-crit (defined by ttype), followed by the median and the lower and upper limits of the 1-alpha confidence interval based on U-crit + 1.

SRANK_CONF(R1, R2, lab, ttype, alpha): returns a column array with the following values based on the Paired Wilcoxon Signed-Ranks test conducted on the samples in ranges R1 and R2: lower and upper limits of the 1-alpha confidence interval using T-crit (defined by ttype), followed by the median and the lower and upper limits of the 1-alpha confidence interval based on T-crit + 1.

If lab = True, then an extra column containing labels is returned (default is False). alpha is the alpha value and defaults to .05.

If ttype = 0 (default), the the normal approximation is used. If ttype = 1 then a table lookup is used (via MCRIT and MPROB, or SRankCRIT and SRankPROB) with harmonic interpolation. If ttype = 2 then a table lookup is used with linear interpolation, and if ttype = 3 then an exact test is used (via MANNINV and MANNDIST, or PERMINV and PERMDIST).

There is also a one sample version of the SRANK_CONF function, namely

SRANK_CONF(R1, v, lab, ttype, alpha)

where v is the hypothetical median value.

Multinomial Logistic Regression Accuracy

A new function has been added that calculates the accuracy of a Multinomial Logistic Regression model.

MLogit_Accuracy(R1, r, lab, head, iter): returns a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. Thus, if R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = True).

The arguments R1, r, lab, head and iter are as for the MLogitCoeff function.

New Distribution Functions

The following new functions support distributions that are useful in Bayesian statistics.

IGAMMA_DIST(x, alpha, beta, cum): inverse gamma distribution

IGAMMA_INV(p, alpha, beta): inverse of the inverse gamma cdf

ICHISQ_DIST(x, df, cum): inverse chi-square distribution

ICHISQ_INV(p, df): inverse of the inverse chi-square cdf

DIRICHLET_DIST(pvector, avector): Dirichlet distribution at pvector (whose values add up to 1) based on alpha parameters in avector.

DIRICHLET_RAND(avector): an array pvector consisting of random values for the Dirichlet distribution with alpha values in avector

Minor Enhancements

The CountRowsUnique function has been enhanced by adding two new optional arguments. CountRowsUnique(R1, head, ncols): returns a count of the unique rows in R1; if head = True (default False) then first row of R1 (presumably a heading) is not counted; the last ncols columns of R1 are not considered when determining uniqueness (ncols = 0).

The SUBRANGE function has been eliminated since its role is subsumed in the SUBMATRIX function.

The LogitCoeff, LogitCoeff2, LogitPred, LogitRSquare and LogitTest functions now accept non-numeric data. Rows with non-numeric data are ignored. In addition, a new optional guess argument has been added to these functions (as the last argument) which allows the user to specify initial coefficient values (instead of the default of zeros).

A new LinAlg category has been added to Excel’s standard Insert Function facility. This enables Windows users of versions of Excel 2010/2013/2016 to get additional information about the various Real Statistics matrix functions (DIAGONAL, eVECTORS, SCHUR, etc.). In addition, a new StatMath category has been added with some mathematical functions (especially those for complex number and matrix operations).

Various additional functions now accept arrays and not just ranges as their arguments.

Bug Fixes

  • Fixes an error in the Kaplan-Meier confidence intervals
  • Fixes an error in the VAR2_POWER and VAR2_SIZE functions
  • Fixes an overflow error in the eVECTORS function
  • Fixes a bug in the various logistic regression capabilities whereby LN(0) was not trapped and so an error was issued. Also fixes an error in the Multinomial Logistic Regression tool when checking whether the output will overwrite existing cells.
  • Fixes a bug in LogCoeff2 which prevented it from displaying its values.

Posted in Announcement, New Release | 2 Comments

Real Statistics with 64 bit Excel

While most people use the 32 bit version of Excel, there is an important community of users of the 64 bit version of Excel. A number of people have asked the question as to whether the Real Statistics Resource Pack works with the 64 bit version of Excel.

The following is an answer to this question based on the experiences of Paolo Piva, who is using the Real Statistics Resource Pack in Windows 10 (64 bit) using the 64 bit version of Excel 2016.

“I confirm that I use it under Windows 10 Pro 64 bits and that I installed it on Excel 64 bits, as part of Office 365 (which is Excel 2016).”

Paolo noted further that “when I closed Excel and then launched it again, the add-in menu with Real Statistics did not show up in the ribbon… I had therefore to implement the suggestion Unblock the Add-in File as described in: https://www.excelcampus.com/vba/add-in-ribbon-disappears/

Paolo added “just to make sure that things work properly and the Addins menu is displayed Excel must be run as an Administrator.  If it is not run as an Administrator the Addins menu will not be displayed but still the Real Statistics package will work and can be accessed with the usual “Command M”

I assume here that Paolo means the Ctrl-m key sequence.

Posted in Announcement, Hint | Comments Off on Real Statistics with 64 bit Excel

Real Statistics Rel 4.12

I am pleased to announce Release 4.12 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 and the Multivariate Examples files have been updated for compatibility with the new release.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.12.

The new release contains the following new capabilities:

Deming Regression

A new Deming Regression data analysis tool has been added, along with the following functions:

DRegCoeff(R1, R2, λ, lab) = 2 × 2 array consisting of the intercept and slope coefficients and standard errors for Deming regression on the data in R1 and R2 where lambda = λ.

DRegResiduals(R1, R2, λ, lab) = n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 aand R2 based on the Deming regression on the data in R1 and R2 where lambda = λ and n = the number of elements in R1 (or R2).

DRegIdentity(R1, R2, λ, lab) = 2 × 1 array consisting of  and  for Deming regression on the data in R1 and R2 where lambda = λ.

DRegPred(x0, R1, R2, λ, lab) = 4 × 1 array consisting of the predicted value of y corresponding to x0, the standard error for this prediction and the lower and upper ends of a  confidence interval for this prediction based on the Deming regression on the data in R1 and R2 where lambda = λ.

If lab = TRUE (default FALSE), then a column with labels is appended to the output for readability.

If an explicit lambda argument is omitted, then it is calculated from the data in R1 and R2. If an explicit lambda argument is given, then it is assumed that R1 and R2 are column arrays. We also have the following function, which explicitly calculates lambda from the data in R1 and R2:

DRegLambda(R1, R2) = the lambda value calculated from R1 and R2

The standard errors are calculated using jackknifing as described on the website.

Split-Half and Guttman Reliability

A new Split-Half/Guttman data analysis tool has been added which calculates split-half reliability statistics using Spearman-Brown as well as Guttman’s measurement. In addition, the following functions are now supported.

COV_SPLIT(R1,  s) = sample covariance for the data in range R1 based on the split described by string s.

CORR_SPLIT(R1,  s) = correlation for the data in range R1 based on the split described by string s.

GUTTMAN_SPLIT(R1, s) = Guttman’s lambda for the data in range R1 based on the split described by string s.

GUTTMAN(R1) = the Guttman’s reliability measure for the data in range R1, i.e. the maximum Guttman’s lambda based on all possible splits; when the number of splits is too large, a second argument iter can be used to find an approximate maximum Guttman’s lambda based on a randomly generated iter number of splits.

String s consists of 0’s and 1’s where each character in the string corresponds to one column in R1 (thus the length of s must be equal to the number of columns in R1). E.g. the string s = “101010” represents the split half where the odd numbered questions are in one half and the even numbered questions are in the other half.

The following functions relate to the Spearman-Brown correction:

SB_SPLIT(R1, s) = split half coefficient (after Spearman-Brown correction) for data in R1 based on the split described by the string s.

SB_CORRECTION(r, n, m) = Spearman-Brown correction when the split-half correlation based on an m vs. n–m split is r. If n is omitted, then it is assumed that there is a 50-50 split. If n is present but m is omitted, then it is assumed that m = n/2.

Partitioning Functions

To support the Guttman functions, described above, the following functions have been added:

INIT_SPLIT(n, m) = returns a string of length n consisting of m 0’s followed by n–m 1’s. If omitted m defaults to n/2.

NEXT_SPLIT(s) = returns the string representing the next split after the split represented by s.

E.g. use of these functions will generate the following list of strings of length four: 0011, 0101, 0110, 1001, 1010, 1100.

The following new functions have also been added:

INIT_PARTITION(n) = returns a string consisting of n 0’s

NEXT_PARTITION(s) = returns the string representing the next partition after the partition represented by s.

E.g. use of these functions will generate the following list of strings of length three: 000, 001, 010, 011, 100, 101, 110, 111.

Finally, we also have also added the following functions:

RAND_SPLIT(n, m) = returns a random string of length n consisting of m 0’s and n–m 1’s. If omitted m defaults to n/2.

RAND_PARTITION(n) = returns a random string of length n consisting of 0’s and 1’s.

Spearman-Brown Predicted Reliability

The following functions have been added about Spearman-Brown predicted reliability. These functions can also be used with Cronbach’s Alpha.

SB_PRED(m, rho, n) = Spearman-Brown predicted reliability based on m items when Spearman-Brown for n items is rho.

SB_SIZE(rho1, rho, n) = the number of items necessary to bring the Spearman-Brown predicted reliability up (or down) to rho1 from n items with Spearman-Brown of rho.

LAD Regression Enhancements

The following enhancements have been made to the LAD Regression data analysis tool and the functions that support LAD regression:

  • Support for regression without an intercept has been added. Note the revised forms of the following functions where con = TRUE (default) if regression is with an intercept and con = FALSE if no constant regression term is used.

LADRegCoeff(R1, R2, con, iter)

LADRegWeights(R1, R2, con, iter)

  • An option has been added to the LAD Regression data analysis tool to report the standard error for the regression coefficients, as well as confidence intervals, based on bootstrapping. In addition, there is the following new function which outputs the standard errors based on nboot bootstraps:

LADRegCoeffSE(R1, R2, con, iter, nboot)

Note too that a bug has been fixed where the variable names were not shown correctly when the output from the data analysis tool was displayed on a new page.

Total Least Squares Regression

The following functions have been added to calculate the regression coefficients when total least squares is used instead of ordinary least squares.

If there is only one independent variable, the results will be the same as Deming Regression where lambda = 1. With more than one independent variable, the singular value decomposition (SVD) of the matrix containing the data from both the independent and dependent variables is used.

TRegCoeff0(R1, R2, lab) = column array consisting of the intercept and slope coefficients based on total least squares linear regression using the data in R1 and R2, where both R1 (containing x data) and R2 (containing y data) are column arrays. If lab = TRUE (default FALSE) then a column is appended with the labels “intercept” and “slope”.

TRegCoeff(R1, R2, iter) = column array consisting of the regression coefficients based on total least squares linear regression using the data in R1 and R2. iter (default 100) is the number of iteration used in calculating the SVD. iter is ignored if R1 contains only one column.

Three Factor ANOVA without Replication

The Three Factor ANOVA data analysis tool has been enhanced to support three factor ANOVA without replication.

Enhancement to Average Rank Function

The following function has been enhanced to solve problems with the standard Excel RANK.AVG function:

RANK_AVG(x, R1, order, num_digits)

This function is equivalent to the standard Excel function RANK.AVG, except that the values in range R1 are rounded off to num_digits decimal places (default 8) before the ranking is done.

This function addresses the following shortcomings in the RANK.AVG function:

  • RANK.AVG doesn’t handle decimal precision very well. E.g. you can have cases where 12.1 appears in the range A1:A10, yet =RANK.AVG(12.1, A1:A10) returns the error value #N/A.
  • RANK.AVG only accepts an explicit cell range as its second argument. E.g. even though A1:A10-B1:B10 evaluates to a cell range, =RANK.AVG(A1, A1:A10-B1:B10) will produce an error
  • RANK.AVG is not available in versions of Excel prior to Excel 2010.

Singular Value Decomposition Enhancement

The Singular Value Decomposition (SVD) option on the Matrix Operations data analysis tool has been enhanced to support non-square matrices. This is also the case for the following functions: SVD_U, SVD_D and SVD_V.

Donation Webpage

The website and Real Statistics software are free and will remain free of charge.

In order to offset some of my costs (especially for hosting services, software and materials), I am asking for a small donation if you are happy with the services that you have been receiving. This will be appreciated, but is completely optional. To donate, please click on Please Donate.

Bug Fixes

  • A bug in calculating the confidence interval in the Multiple Anova option of the Manova data analysis tool has been fixed.
  • When clicking on the Mann-Whitney, Wilcoxon Signed Ranks or Kruskal-Wallis options of the Non-parametric Tests data analysis tool, the Descriptive Statistics data analysis tool was incorrectly displayed. This has now been corrected.
  • Fixed an error in the display of Cook’s D output from the Linear Regression data analysis tool when the Column headings included in the data is unchecked.
  • Minor changes to some of the dialog boxes have been made, especially to correct some errors in the tool tips.
  • Various broken links on the website have been repaired, which should make the user experience better.

Improved Web Hosting Performance

Last month I contracted to upgrade the web hosting capabilities to improve the availability and response time of the Real Statistics website. After a few weeks of problems, I believe we are now seeing the benefits of this upgrade.

Ongoing Activities

I continue to upgrade the Real Statistics Resource Pack to accomplish the following two objectives. This will take time and will happen over the next several software releases:

  • Add documentation for each Real Statistics function to Excel’s standard Insert Function facility
  • Allow any function arguments that call for a cell range to use the output from another function as long as that output is equivalent to a cell range. E.g. you can also use the formula =GUTTMAN(TRANSPOSE(A1:J5)) or =GUTTMAN(A1:E10-F1:J10).

Charles

Posted in Announcement, New Release | Comments Off on Real Statistics Rel 4.12

Real Statistics Rel 4.11

I am pleased to announce Release 4.11 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 have been updated for compatibility with the new release.

I know that this new release comes only a few days after the previous release, but I wanted to make sure that you got the following new features:

New Box Plots

A new Box Plot with Outliers option has been added to the Descriptive Statistics data analysis tool. This tool displays outliers as small circles and restricts the whiskers on the plot to 1.5 times the IQR above and below the box in the chart.

Revised QQ Plots

The existing QQ Plot option of the Descriptive Statistics data analysis tool has been modified so that one QQ Plot is now generated for each column in the Input Range.

Stepwise Regression

This capability was supposed to be included in Rel 4.10, but was inadvertently excluded. It is now part of Rel 4.11 as an option to the Linear Regression data analysis tool.

Charles

Posted in Announcement, New Release | Comments Off on Real Statistics Rel 4.11

Real Statistics Rel 4.10

I am pleased to announce Release 4.10 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

I apologize to all you Mac users. I expect to turn my attention to creating a new Mac version of the Real Statistics Resource Pack shortly.

I want to thank all of you who have identified bugs or have suggested enhancements. I have tried to include fixes for all bugs that have been identified and support for at least some of the suggested enhancements.

The spreadsheets for all the examples used on the Real Statistics website are now available for free download (Download Examples Workbooks). These spreadsheets are contained in four Excel files (i.e. workbooks): Examples Workbook Parts 1 and 2, Multivariate Examples and Time Series Workbook. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.10.

A focus in this release is on regression enhancements, although other important features have been added as well. Release 4.10 contains the following new features:

Polynomial Regression

A new Polynomial Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that is provided by the new data analysis tool. Here, Rx and Ry are column arrays containing x and y data values and deg is the degree/order of the polynomial

PolyDesign(Rx, deg, ones) – returns an array consisting of x, x2, …, xdeg columns. If ones = TRUE, then the output is 1, x, x2, …, xdeg

PolyCoeff(Rx, Ry, deg) – returns a column array consisting of the polynomial regression coefficients and their standard errors

PolyRSquare(Rx, Ry, deg) = R-square value for the polynomial regression

PolyDeg(Rx, Ry, maxdeg) = the highest degree polynomial ≤ maxdeg which produces a significantly different R-square value

Least Absolute Deviation (LAD) Regression

A new Least Absolute Deviation Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that provided by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and iter is the number of iterations used in the iteratively reweighted least squares algorithm (default = 25).

LADRegCoeff(Rx, Ry, iter) = × 2 range consisting of the regression coefficient vector followed by vector of standard errors of these coefficients

LADRegWeights(Rx, Ry, iter) = × 1 column range consisting of the weights calculated from the iteratively reweighted least squares algorithm

Note, that in addition to describing the iteratively reweighted least squares algorithm, the website will also describe the Simplex method for calculating the LAD regression coefficients.

New Extracting Columns from a Data Range Data Analysis Tool

The existing Extracting Columns from a Data Range data analysis tool has been completely revised. In addition to more easily selecting which columns you want to retain from a data range, you will now have the option to create (1) tag/dummy or categorical codes for selected columns, (2) interactions between the variables (e.g. xy) representing selected columns and (3) powers of variables in selected columns (x2, x3, etc.).

Simplifications to Regression Data Analysis Tools

The Multiple Regression data analysis tool has been simplified by the elimination of the Tag/dummy coding options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

The Logistic Regression data analysis tool has also been simplified by the elimination of the Categorical coding and the Deletion of variables options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

Stepwise Regression Capabilities

A Stepwise Regression option has been added to the Multiple Regression data analysis tool. When this option is selected an automatic selection of a subset of variables is made that produces a regression model that fits the data which is in some sense similar to that of the full regression model containing all the variables.

The output from this data analysis tool shows how the stepwise selection of variable was made along with the regression analysis using these variables.

In addition, the following new functions are supported which are used by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and Rv is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise.

RegRank(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by Rx, Ry and Rv.

RegCoeffP(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by Rx, Ry and Rv.

RegStepwise(Rx, Ry) – returns a 1 × k array Rv where each non-blank elements in Rv corresponds to an x variable that should be retained in the stepwise regression model. Actually the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model.

Optimize Time Series Forecasting

An Optimize MSE option has been added to the Basic Forecasting data analysis tool. When this option is chosen, values of Alpha, Beta and Gamma are found which minimize the squared error for the Simple Exponential Smoothing, Holt’s Linear Trend or Holt-Winter’s Method.

Changes to the Augmented Dickey-Fuller Test

The output from the Augmented Dickey-Fuller (ADF) unit root test function ADFTEST has been augmented with two additional values, namely the first-order autocorrelation coefficient and an estimated p-value. This is also the case for the ADF Test option of the Time Series Testing data analysis tool.

The ADFCRIT(n, alpha, type) function has been revised to deal with alpha values besides .01, .025, .05 and .1. This is accomplished by linear interpolation. In addition, the following new function has been added:

ADFPROB(x, n, type) = estimated p-value (based on linear interpolation) for the ADF test at x for a time series of length n where type is as for ADFCRIT.

New Unit Root Tests for Time Series Analysis

The PP and KPSS tests for a unit root in time series analysis are now supported via the following new array functions:

PPTEST(R1, lab, lags, type, alpha)  – an array function which returns a column range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags, autocorrelation coefficient and p-value.

KPSSTEST(R1, lab, lags, type, alpha)  – an array function which returns a column range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags and p-value.

Thanks to Milos  Cipovic who wrote the software for these tests.

New Features in the ANCOVA Data Analysis Tool

The ANCOVA data analysis tool has been enhanced with the following new options: ability to use data in stacked format and support for contrasts, Tukey’s HSD test and Tukey-Kramer test.

Diversity Indices Data Analysis Tool

The new Diversity Indices data analysis tool calculates Shannon’s, Simpson’s and Brillouin’s diversity indices for categorical data.

Function Categories

When you click on the Insert Function symbol fx next to the Formula toolbar in Excel, you can access a list of all the standard Excel functions along with a brief description of each function and that function’s arguments. This includes all the Real Statistics functions.

The standard Excel functions are split into different categories (Math & Trig, Logical, Text, Statistical, etc.) which makes it easier to find these functions. The Real Statistics functions, however, have all been placed in the User Defined category and very limited information has been available about the functions or their arguments.

I am in the process of adding some new categories for the Real Statistics functions. I am also adding more descriptive information about these functions and their arguments. So far I have added Regression and Distribution categories and additional descriptive information about the functions in these categories. Additional categories and descriptive information will be added in coming releases.

The new categories and descriptive information are only available after you have accessed the Real Statistics data analysis tools (via Ctrl-m or the equivalent) for the first time.

This new capability is not available for versions of the Real Statistics Resource Pack that run on Excel 2007.

New Regression Function

RegPredC(Rx, Rc) = predicted y value for x values in range Rx based on the regression coefficients in range Rc. Rx and Rc can either be column or rows ranges.

New ANOVA Functions

The following functions compute the values SSBet, SSW and SSTot for one-way ANOVA when the data is in stacked format. These are similar to the corresponding pre-existing functions used when data is in Excel format.

In the following, it is assumed that the first column of the input R1 contains the names of the factor levels and column number col contains the data for the one-way ANOVA.

SSWStd(R1, col) = SSW

SSBetStd(R1, col) = SSBet

SSTotStd(R1, col) = SSTot

Bug Fixes and Minor Changes

Corrects an error in the two-tailed Mann-Whitney exact test. Previously, if the p-value for the one-tailed test was greater than .5, the p-value for the two-tailed test was given as 2*p instead of 2*(1–p).

Fixes an error in Contrasts for Single Factor ANOVA data analysis tool. Previously, the formula used to determine whether the test was significant (i.e. the Sig cell) referenced the wrong cell when calculating the value of Alpha.

Fixes an error when using the Cutoff % option in the Logistic Regression data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and 1 (instead of between 0 and 100).

Fixes an error when using the Cutoff % option in the Reliability data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and .5 (instead of between 0 and 50).

Fixes an error on the constraint for the gamma parameter in the Holt-Winter Method option of the Basic Forecasting data analysis tool.

Fixes an error that sometimes occurs in the calculation of the log-rank metrics in the Hypothesis Testing portion of the output when the Kaplan-Meier option of the Survival Analysis data analysis tool is used.

Corrects errors in some labels and tooltips in various dialog boxes.

Allows the MANOVA functions MANOVA_PillaiTrace, MANOVA_WilksLambda, MANOVA_HotelTrace and MANOVA_RoyRoot to work properly even when there is only one dependent variable.

Posted in Announcement, Hint | Comments Off on Real Statistics Rel 4.10

Real Statistics disappears from Addin ribbon

Some of you have reported that Real Statistics disappears from the Add-ins ribbon. What follows is a potential way to eliminate the problem thanks to some research done by Jeff.

First of all, this problem is likely due to a security enhancement that Microsoft recently made to Excel which impacts add-ins. The following steps should be useful in eliminating this problem.

  1. If Excel is running, close it
  2. Find the file with the Real Statistics add-in. Right click on the file and click on the Properties option from the menu that appears.
  3. Towards the bottom of the General tab of the Properties window you will see the security message “This file came from another computer and might be blocked to help protect this computer”. Next to this message is the Unblock check box. Make sure this is checked and press the OK button.
  4. Start Excel

If you don’t see the security message in step 3 above, then probably the cause of the problem is different.

If you are having this problem, please let me know whether or not this approach solves the problem for you.

Again, thanks to Jeff for finding the solution.

Charles

Posted in Hint | 10 Comments