Real Statistics Rel 4.12

I am pleased to announce Release 4.12 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 and the Multivariate Examples files have been updated for compatibility with the new release.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.12.

The new release contains the following new capabilities:

Deming Regression

A new Deming Regression data analysis tool has been added, along with the following functions:

DRegCoeff(R1, R2, λ, lab) = 2 × 2 array consisting of the intercept and slope coefficients and standard errors for Deming regression on the data in R1 and R2 where lambda = λ.

DRegResiduals(R1, R2, λ, lab) = n × 7 array consisting of predicted y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 aand R2 based on the Deming regression on the data in R1 and R2 where lambda = λ and n = the number of elements in R1 (or R2).

DRegIdentity(R1, R2, λ, lab) = 2 × 1 array consisting of  and  for Deming regression on the data in R1 and R2 where lambda = λ.

DRegPred(x0, R1, R2, λ, lab) = 4 × 1 array consisting of the predicted value of y corresponding to x0, the standard error for this prediction and the lower and upper ends of a  confidence interval for this prediction based on the Deming regression on the data in R1 and R2 where lambda = λ.

If lab = TRUE (default FALSE), then a column with labels is appended to the output for readability.

If an explicit lambda argument is omitted, then it is calculated from the data in R1 and R2. If an explicit lambda argument is given, then it is assumed that R1 and R2 are column arrays. We also have the following function, which explicitly calculates lambda from the data in R1 and R2:

DRegLambda(R1, R2) = the lambda value calculated from R1 and R2

The standard errors are calculated using jackknifing as described on the website.

Split-Half and Guttman Reliability

A new Split-Half/Guttman data analysis tool has been added which calculates split-half reliability statistics using Spearman-Brown as well as Guttman’s measurement. In addition, the following functions are now supported.

COV_SPLIT(R1,  s) = sample covariance for the data in range R1 based on the split described by string s.

CORR_SPLIT(R1,  s) = correlation for the data in range R1 based on the split described by string s.

GUTTMAN_SPLIT(R1, s) = Guttman’s lambda for the data in range R1 based on the split described by string s.

GUTTMAN(R1) = the Guttman’s reliability measure for the data in range R1, i.e. the maximum Guttman’s lambda based on all possible splits; when the number of splits is too large, a second argument iter can be used to find an approximate maximum Guttman’s lambda based on a randomly generated iter number of splits.

String s consists of 0’s and 1’s where each character in the string corresponds to one column in R1 (thus the length of s must be equal to the number of columns in R1). E.g. the string s = “101010” represents the split half where the odd numbered questions are in one half and the even numbered questions are in the other half.

The following functions relate to the Spearman-Brown correction:

SB_SPLIT(R1, s) = split half coefficient (after Spearman-Brown correction) for data in R1 based on the split described by the string s.

SB_CORRECTION(r, n, m) = Spearman-Brown correction when the split-half correlation based on an m vs. n–m split is r. If n is omitted, then it is assumed that there is a 50-50 split. If n is present but m is omitted, then it is assumed that m = n/2.

Partitioning Functions

To support the Guttman functions, described above, the following functions have been added:

INIT_SPLIT(n, m) = returns a string of length n consisting of m 0’s followed by n–m 1’s. If omitted m defaults to n/2.

NEXT_SPLIT(s) = returns the string representing the next split after the split represented by s.

E.g. use of these functions will generate the following list of strings of length four: 0011, 0101, 0110, 1001, 1010, 1100.

The following new functions have also been added:

INIT_PARTITION(n) = returns a string consisting of n 0’s

NEXT_PARTITION(s) = returns the string representing the next partition after the partition represented by s.

E.g. use of these functions will generate the following list of strings of length three: 000, 001, 010, 011, 100, 101, 110, 111.

Finally, we also have also added the following functions:

RAND_SPLIT(n, m) = returns a random string of length n consisting of m 0’s and n–m 1’s. If omitted m defaults to n/2.

RAND_PARTITION(n) = returns a random string of length n consisting of 0’s and 1’s.

Spearman-Brown Predicted Reliability

The following functions have been added about Spearman-Brown predicted reliability. These functions can also be used with Cronbach’s Alpha.

SB_PRED(m, rho, n) = Spearman-Brown predicted reliability based on m items when Spearman-Brown for n items is rho.

SB_SIZE(rho1, rho, n) = the number of items necessary to bring the Spearman-Brown predicted reliability up (or down) to rho1 from n items with Spearman-Brown of rho.

LAD Regression Enhancements

The following enhancements have been made to the LAD Regression data analysis tool and the functions that support LAD regression:

  • Support for regression without an intercept has been added. Note the revised forms of the following functions where con = TRUE (default) if regression is with an intercept and con = FALSE if no constant regression term is used.

LADRegCoeff(R1, R2, con, iter)

LADRegWeights(R1, R2, con, iter)

  • An option has been added to the LAD Regression data analysis tool to report the standard error for the regression coefficients, as well as confidence intervals, based on bootstrapping. In addition, there is the following new function which outputs the standard errors based on nboot bootstraps:

LADRegCoeffSE(R1, R2, con, iter, nboot)

Note too that a bug has been fixed where the variable names were not shown correctly when the output from the data analysis tool was displayed on a new page.

Total Least Squares Regression

The following functions have been added to calculate the regression coefficients when total least squares is used instead of ordinary least squares.

If there is only one independent variable, the results will be the same as Deming Regression where lambda = 1. With more than one independent variable, the singular value decomposition (SVD) of the matrix containing the data from both the independent and dependent variables is used.

TRegCoeff0(R1, R2, lab) = column array consisting of the intercept and slope coefficients based on total least squares linear regression using the data in R1 and R2, where both R1 (containing x data) and R2 (containing y data) are column arrays. If lab = TRUE (default FALSE) then a column is appended with the labels “intercept” and “slope”.

TRegCoeff(R1, R2, iter) = column array consisting of the regression coefficients based on total least squares linear regression using the data in R1 and R2. iter (default 100) is the number of iteration used in calculating the SVD. iter is ignored if R1 contains only one column.

Three Factor ANOVA without Replication

The Three Factor ANOVA data analysis tool has been enhanced to support three factor ANOVA without replication.

Enhancement to Average Rank Function

The following function has been enhanced to solve problems with the standard Excel RANK.AVG function:

RANK_AVG(x, R1, order, num_digits)

This function is equivalent to the standard Excel function RANK.AVG, except that the values in range R1 are rounded off to num_digits decimal places (default 8) before the ranking is done.

This function addresses the following shortcomings in the RANK.AVG function:

  • RANK.AVG doesn’t handle decimal precision very well. E.g. you can have cases where 12.1 appears in the range A1:A10, yet =RANK.AVG(12.1, A1:A10) returns the error value #N/A.
  • RANK.AVG only accepts an explicit cell range as its second argument. E.g. even though A1:A10-B1:B10 evaluates to a cell range, =RANK.AVG(A1, A1:A10-B1:B10) will produce an error
  • RANK.AVG is not available in versions of Excel prior to Excel 2010.

Singular Value Decomposition Enhancement

The Singular Value Decomposition (SVD) option on the Matrix Operations data analysis tool has been enhanced to support non-square matrices. This is also the case for the following functions: SVD_U, SVD_D and SVD_V.

Donation Webpage

The website and Real Statistics software are free and will remain free of charge.

In order to offset some of my costs (especially for hosting services, software and materials), I am asking for a small donation if you are happy with the services that you have been receiving. This will be appreciated, but is completely optional. To donate, please click on Please Donate.

Bug Fixes

  • A bug in calculating the confidence interval in the Multiple Anova option of the Manova data analysis tool has been fixed.
  • When clicking on the Mann-Whitney, Wilcoxon Signed Ranks or Kruskal-Wallis options of the Non-parametric Tests data analysis tool, the Descriptive Statistics data analysis tool was incorrectly displayed. This has now been corrected.
  • Fixed an error in the display of Cook’s D output from the Linear Regression data analysis tool when the Column headings included in the data is unchecked.
  • Minor changes to some of the dialog boxes have been made, especially to correct some errors in the tool tips.
  • Various broken links on the website have been repaired, which should make the user experience better.

Improved Web Hosting Performance

Last month I contracted to upgrade the web hosting capabilities to improve the availability and response time of the Real Statistics website. After a few weeks of problems, I believe we are now seeing the benefits of this upgrade.

Ongoing Activities

I continue to upgrade the Real Statistics Resource Pack to accomplish the following two objectives. This will take time and will happen over the next several software releases:

  • Add documentation for each Real Statistics function to Excel’s standard Insert Function facility
  • Allow any function arguments that call for a cell range to use the output from another function as long as that output is equivalent to a cell range. E.g. you can also use the formula =GUTTMAN(TRANSPOSE(A1:J5)) or =GUTTMAN(A1:E10-F1:J10).

Charles

Posted in Announcement, New Release | Leave a comment

Real Statistics Rel 4.11

I am pleased to announce Release 4.11 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments. Also the Examples Workbook Parts 1 and 2 have been updated for compatibility with the new release.

I know that this new release comes only a few days after the previous release, but I wanted to make sure that you got the following new features:

New Box Plots

A new Box Plot with Outliers option has been added to the Descriptive Statistics data analysis tool. This tool displays outliers as small circles and restricts the whiskers on the plot to 1.5 times the IQR above and below the box in the chart.

Revised QQ Plots

The existing QQ Plot option of the Descriptive Statistics data analysis tool has been modified so that one QQ Plot is now generated for each column in the Input Range.

Stepwise Regression

This capability was supposed to be included in Rel 4.10, but was inadvertently excluded. It is now part of Rel 4.11 as an option to the Linear Regression data analysis tool.

Charles

Posted in Announcement, New Release | Comments Off on Real Statistics Rel 4.11

Real Statistics Rel 4.10

I am pleased to announce Release 4.10 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

I apologize to all you Mac users. I expect to turn my attention to creating a new Mac version of the Real Statistics Resource Pack shortly.

I want to thank all of you who have identified bugs or have suggested enhancements. I have tried to include fixes for all bugs that have been identified and support for at least some of the suggested enhancements.

The spreadsheets for all the examples used on the Real Statistics website are now available for free download (Download Examples Workbooks). These spreadsheets are contained in four Excel files (i.e. workbooks): Examples Workbook Parts 1 and 2, Multivariate Examples and Time Series Workbook. See Workbook Examples for a description of which examples are contained in which files.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.10.

A focus in this release is on regression enhancements, although other important features have been added as well. Release 4.10 contains the following new features:

Polynomial Regression

A new Polynomial Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that is provided by the new data analysis tool. Here, Rx and Ry are column arrays containing x and y data values and deg is the degree/order of the polynomial

PolyDesign(Rx, deg, ones) – returns an array consisting of x, x2, …, xdeg columns. If ones = TRUE, then the output is 1, x, x2, …, xdeg

PolyCoeff(Rx, Ry, deg) – returns a column array consisting of the polynomial regression coefficients and their standard errors

PolyRSquare(Rx, Ry, deg) = R-square value for the polynomial regression

PolyDeg(Rx, Ry, maxdeg) = the highest degree polynomial ≤ maxdeg which produces a significantly different R-square value

Least Absolute Deviation (LAD) Regression

A new Least Absolute Deviation Regression data analysis tool has been added.

In addition, the following new functions are supported which provide similar support to that provided by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and iter is the number of iterations used in the iteratively reweighted least squares algorithm (default = 25).

LADRegCoeff(Rx, Ry, iter) = × 2 range consisting of the regression coefficient vector followed by vector of standard errors of these coefficients

LADRegWeights(Rx, Ry, iter) = × 1 column range consisting of the weights calculated from the iteratively reweighted least squares algorithm

Note, that in addition to describing the iteratively reweighted least squares algorithm, the website will also describe the Simplex method for calculating the LAD regression coefficients.

New Extracting Columns from a Data Range Data Analysis Tool

The existing Extracting Columns from a Data Range data analysis tool has been completely revised. In addition to more easily selecting which columns you want to retain from a data range, you will now have the option to create (1) tag/dummy or categorical codes for selected columns, (2) interactions between the variables (e.g. xy) representing selected columns and (3) powers of variables in selected columns (x2, x3, etc.).

Simplifications to Regression Data Analysis Tools

The Multiple Regression data analysis tool has been simplified by the elimination of the Tag/dummy coding options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

The Logistic Regression data analysis tool has also been simplified by the elimination of the Categorical coding and the Deletion of variables options. These capabilities are now provided, in a simpler-to-use way, by the Extract Columns from a Data Range data analysis tool.

Stepwise Regression Capabilities

A Stepwise Regression option has been added to the Multiple Regression data analysis tool. When this option is selected an automatic selection of a subset of variables is made that produces a regression model that fits the data which is in some sense similar to that of the full regression model containing all the variables.

The output from this data analysis tool shows how the stepwise selection of variable was made along with the regression analysis using these variables.

In addition, the following new functions are supported which are used by the new data analysis tool. Here, Rx is an n × k array containing x data values, Ry is an n × 1 array containing y data values and Rv is a 1 × k array containing a non-blank symbol if the corresponding variable is in the regression model and an empty string otherwise.

RegRank(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient that can be added to the regression model defined by Rx, Ry and Rv.

RegCoeffP(Rx, Ry, Rv) – returns a 1 × k array containing the p-value of each x coefficient in the regression model defined by Rx, Ry and Rv.

RegStepwise(Rx, Ry) – returns a 1 × k array Rv where each non-blank elements in Rv corresponds to an x variable that should be retained in the stepwise regression model. Actually the output is a 1 × k+1 array where the last element is a positive integer equal to the number of steps performed in creating the stepwise regression model.

Optimize Time Series Forecasting

An Optimize MSE option has been added to the Basic Forecasting data analysis tool. When this option is chosen, values of Alpha, Beta and Gamma are found which minimize the squared error for the Simple Exponential Smoothing, Holt’s Linear Trend or Holt-Winter’s Method.

Changes to the Augmented Dickey-Fuller Test

The output from the Augmented Dickey-Fuller (ADF) unit root test function ADFTEST has been augmented with two additional values, namely the first-order autocorrelation coefficient and an estimated p-value. This is also the case for the ADF Test option of the Time Series Testing data analysis tool.

The ADFCRIT(n, alpha, type) function has been revised to deal with alpha values besides .01, .025, .05 and .1. This is accomplished by linear interpolation. In addition, the following new function has been added:

ADFPROB(x, n, type) = estimated p-value (based on linear interpolation) for the ADF test at x for a time series of length n where type is as for ADFCRIT.

New Unit Root Tests for Time Series Analysis

The PP and KPSS tests for a unit root in time series analysis are now supported via the following new array functions:

PPTEST(R1, lab, lags, type, alpha)  – an array function which returns a column range for the PP test consisting of tau-stat, tau-crit, stationary (yes/no), lags, autocorrelation coefficient and p-value.

KPSSTEST(R1, lab, lags, type, alpha)  – an array function which returns a column range for the KPSS test consisting of test-stat, crit-value, stationary (yes/no), lags and p-value.

Thanks to Milos  Cipovic who wrote the software for these tests.

New Features in the ANCOVA Data Analysis Tool

The ANCOVA data analysis tool has been enhanced with the following new options: ability to use data in stacked format and support for contrasts, Tukey’s HSD test and Tukey-Kramer test.

Diversity Indices Data Analysis Tool

The new Diversity Indices data analysis tool calculates Shannon’s, Simpson’s and Brillouin’s diversity indices for categorical data.

Function Categories

When you click on the Insert Function symbol fx next to the Formula toolbar in Excel, you can access a list of all the standard Excel functions along with a brief description of each function and that function’s arguments. This includes all the Real Statistics functions.

The standard Excel functions are split into different categories (Math & Trig, Logical, Text, Statistical, etc.) which makes it easier to find these functions. The Real Statistics functions, however, have all been placed in the User Defined category and very limited information has been available about the functions or their arguments.

I am in the process of adding some new categories for the Real Statistics functions. I am also adding more descriptive information about these functions and their arguments. So far I have added Regression and Distribution categories and additional descriptive information about the functions in these categories. Additional categories and descriptive information will be added in coming releases.

The new categories and descriptive information are only available after you have accessed the Real Statistics data analysis tools (via Ctrl-m or the equivalent) for the first time.

This new capability is not available for versions of the Real Statistics Resource Pack that run on Excel 2007.

New Regression Function

RegPredC(Rx, Rc) = predicted y value for x values in range Rx based on the regression coefficients in range Rc. Rx and Rc can either be column or rows ranges.

New ANOVA Functions

The following functions compute the values SSBet, SSW and SSTot for one-way ANOVA when the data is in stacked format. These are similar to the corresponding pre-existing functions used when data is in Excel format.

In the following, it is assumed that the first column of the input R1 contains the names of the factor levels and column number col contains the data for the one-way ANOVA.

SSWStd(R1, col) = SSW

SSBetStd(R1, col) = SSBet

SSTotStd(R1, col) = SSTot

Bug Fixes and Minor Changes

Corrects an error in the two-tailed Mann-Whitney exact test. Previously, if the p-value for the one-tailed test was greater than .5, the p-value for the two-tailed test was given as 2*p instead of 2*(1–p).

Fixes an error in Contrasts for Single Factor ANOVA data analysis tool. Previously, the formula used to determine whether the test was significant (i.e. the Sig cell) referenced the wrong cell when calculating the value of Alpha.

Fixes an error when using the Cutoff % option in the Logistic Regression data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and 1 (instead of between 0 and 100).

Fixes an error when using the Cutoff % option in the Reliability data analysis tool. Previously, an error message was generated if this value wasn’t between 0 and .5 (instead of between 0 and 50).

Fixes an error on the constraint for the gamma parameter in the Holt-Winter Method option of the Basic Forecasting data analysis tool.

Fixes an error that sometimes occurs in the calculation of the log-rank metrics in the Hypothesis Testing portion of the output when the Kaplan-Meier option of the Survival Analysis data analysis tool is used.

Corrects errors in some labels and tooltips in various dialog boxes.

Allows the MANOVA functions MANOVA_PillaiTrace, MANOVA_WilksLambda, MANOVA_HotelTrace and MANOVA_RoyRoot to work properly even when there is only one dependent variable.

Posted in Announcement, Hint | Comments Off on Real Statistics Rel 4.10

Real Statistics disappears from Addin ribbon

Some of you have reported that Real Statistics disappears from the Add-ins ribbon. What follows is a potential way to eliminate the problem thanks to some research done by Jeff.

First of all, this problem is likely due to a security enhancement that Microsoft recently made to Excel which impacts add-ins. The following steps should be useful in eliminating this problem.

  1. If Excel is running, close it
  2. Find the file with the Real Statistics add-in. Right click on the file and click on the Properties option from the menu that appears.
  3. Towards the bottom of the General tab of the Properties window you will see the security message “This file came from another computer and might be blocked to help protect this computer”. Next to this message is the Unblock check box. Make sure this is checked and press the OK button.
  4. Start Excel

If you don’t see the security message in step 3 above, then probably the cause of the problem is different.

If you are having this problem, please let me know whether or not this approach solves the problem for you.

Again, thanks to Jeff for finding the solution.

Charles

Posted in Hint | 10 Comments

Real Statistics Release 4.9

I am pleased to announce Release 4.9 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2010, 2013 and 2016 (Windows version) environments. It will be available for free download for Excel 2007 environments later today.

The spreadsheets for all the examples used on the Real Statistics website are available for free download (Download Examples Workbooks). These spreadsheets are contained in four Excel files (i.e. workbooks):

  • Examples Workbook Part 1
  • Examples Workbook Part 2
  • Multivariate Examples
  • Time Series Workbook

See Workbook Examples for a description of which examples are contained in which files.

On May 26, the Time Series Workbook was updated with all the examples from the previous release of the Real Statistics software. Today, the Examples Workbook Part 1 and Examples Workbook Part 2 have been updated with examples based on today’s release.

The Real Statistics website will be updated over the course of the next several days to reflect the new capabilities in Release 4.9.

Release 4.9 contains the following new features:

Design of Experiments

A new Design of Experiments section is being added to the website, This section describes the following new data analysis tools:

  • Randomized Complete Block Design
  • Split-Plot Design
  • Latin Squares Design

Contrasts and Tukey’s HSD tests are also provided for each of these designs.

New User-Interface

A new Config command button has been added to the main interface to the Real Statistics data analysis tools (which is accessible via Ctrl-m or the Add-Ins ribbon). If you click on the Config button (located on the right side of the main dialog box) you will be presented with the following dialog box:

Configuration dialog box

If you click on the Use New Menu option and press the OK button, the main menu will change to the following multi-page format:

Multi-page dialog box

The various data analysis tools are now organized into the following six tabs:

  • Descriptive Statistics and Utilities (Desc)
  • Regression (Reg)
  • Anova (Anova)
  • Time Series (Time S)
  • Multivariate Analysis (Multi Var)
  • Miscellaneous (Misc)

You can also access the Configuration dialog box (e.g. to change back to the old menu format) by clicking on the Config button.

Default decimal values in dialog boxes

On many of the dialog boxes the user can enter a value for Alpha by overriding the default value of .05. Because Excel accepts both the 0.05 and 0,05 formats for decimals, this has introduced problems for some users. This new release provides the following two potential solutions to this problem:

  1. Simply enter a zero in the Alpha field. This should not produce an error message. You will need to change the value of Alpha in the output report since a significance level of zero is not really acceptable statistically.
  2. Use the new Use Percentage configuration option (as shown above). If you choose this option on the Configuration dialog box, then the default value for Alpha will be 5 (representing 5%) and so no decimal will be required. You can override the default value by entering some other whole number (e.g. 1 representing 1%).

ANOVA post-hoc tests enhancements

A p-value has now been added to the various data analysis tools that support Tukey’s HSD, Tukey-Kramer, Scheffe, Nemiyi and Games-Howell post-hoc tests.

New F_DIST function

Excel’s FDIST and F.DIST functions only accept positive integer values for the degrees of freedom. The Real Statistics F_DIST function takes the form F_DIST(x, df1, df2, cum) where df1 and df2 can take any positive value, including values less than 1.

Revised Welch’s test

The function WELCH_TEST and the Welch’s Anova option of the One-Factor ANOVA data analysis tool have been revised to use the new F_DIST function instead of FDIST. This yields more accurate results.

Reformatting enhancements

On the Reformatting Data Range data analysis tool, when you fill in either the # of Rows or # of Columns options and leave the other blank, then its value will automatically be calculated to the least value capable of holding all the data in the Input Range.

A new version of the RESHAPE array function has been provided (at the request of Oliver). The previous version (still supported) takes the form

RESHAPE(R1, c) – fills the highlighted range with the data in range R1 (using the character c to fill in any extra cells, with c defaulting to #N/A)

The new version takes the form RESHAPE(R1, c, nrows, ncols), which allows you to explicitly specify the # of rows and columns in the output. This enables you to call this function from within another function. E.g. suppose that you have a column of data in the range A1:A9, you could calculate =eVALUES(RESHAPE(A1:A9,,3,3)).

Jarque-Barre normality test option

The Real Statistics implementation of the Jarque-Barre test for normality has now been expanded. The functions JARQUE and JBTEST now take the forms JARQUE(R1, pop) and JBTEST(R1, pop). When pop = True (default) the population forms of the skewness and kurtosis is used, while when pop = False the sample forms are used.

Fleiss’s Kappa enhancement

An argument has been added to the Real Statistics KAPPA function. When this argument is set to True then the calculation for the standard error for Fleiss’s Kappa in the original paper by Fleiss will be used instead of the calculation previously employed; the default for this argument is False (i.e. the previous calculation)

Bug Fixes and Minor Changes

  • The control tips and tab order on many of the data analysis tool dialog boxes have been corrected or revised
  • The spelling of the SHAPELY function has been changed to SHAPLEY (for the Shapley-Owen Decomposition)
  • The DCRIT function (critical value from the Dunnett’s test table) now returns values for df larger than 240 (previously an error value was generated)
  • The QDIST function (p-value of the Studentized range distribution) now returns values for large values of df (previously an error value was generated)
  • The W value for n = 49 and alpha = .02 in the Shapiro-Wilk Table (Table 2) has been changed to .937 (from .0937). Thus, the formula =SWProb(49,0.937,FALSE) now yields the value .02.
  • The Freeze option on the Reformatting a Data Range by Rows data analysis tool no longer pertains to the Sort  by Rows option.
Posted in Announcement, New Release | Comments Off on Real Statistics Release 4.9

Quick Update 26 May 2016

I just wanted to let you know about the following updates to the Real Statistics website:

  1. The website has now been updated to reflect all the new capabilities introduced in Rel 4.7 and 4.8. I still plan to add a few more pages with proofs of some of the properties and some more technical matters.
  2. A new webpage has been added to explain how to create forecasts with seasonal data using multiple linear regression
  3. The Time Series Examples workbook has been updated to reflect all the new features.

Charles

 

Posted in Announcement, Hint | Comments Off on Quick Update 26 May 2016

Real Statistics Release 4.8

I am pleased to announce Release 4.8 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2007, 2010, 2013 and 2016 (Windows version) environments.

The spreadsheets for all the examples used on the Real Statistics website are available for free download (Download Examples Workbooks) are being updated for compatibility with the new release. These spreadsheets are contained in four Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2, Multivariate Examples and Time Series Workbook. See Workbook Examples for a description of which examples are contained in which files.

Release 4.8 contains the following new features:

Shapely-Owen Decomposition

Adds the following new array function:

SHAPELY(Rx, Ry) = a column range which gives the Shapely-Owen decomposition of R-square for the linear regression of Rx (which contains the x data values) on Ry (which contains the y data values).

This decomposition shows the relative contribution of each of the independent variables to the regression model.

Box-Cox Transformation

Adds the following new functions which carry out Box-Cox transformations

BOXCOX(R1, λ): array function which returns a range containing the Box-Cox transformation of the data in range R1 using the given lambda value. If the lambda argument is omitted, then the transformation which best normalizes the data in R1 is used, based on maximizing the log-likelihood function.

BOXCOXLL(R1, λ) = log likelihood function of the Box-Cox transformation of the data in R1 using the given lambda value

BOXCOXLambda(R1) = the value of lambda which maximizes the log likelihood function of the Box-Cox transformation of the data in R1

New Support for the Multivariate Normal Distribution

Adds the following new functions

BNORMSDIST(x, y, r, cum) = the cdf of the standard bivariate normal distribution at x and y with correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

BNORMDIST(x1, x2, m1, m2, s1, s2, r, cum) = the cdf of the standard bivariate normal distribution at at x1 and x2 with means m1 and m2, standard deviations s1 and s2 and correlation coefficient r if cum = TRUE and the pdf if cum = FALSE

MNORMRAND(Rm, Rc): generates an array containing a random k × 1 vector from a k-dimensional multivariate normal distribution where Rm is the range that contains the k × 1 means vector and Rc is the range that contains the k × k covariance matrix

BNORMRAND(m1, m2, s1, s2, r): generates a random 2 × 1 vector from a bivariate normal distribution with means m1, m2, standard deviations s1, s2 and
correlation coefficient r

These last two functions can be used to generate a random vector which comes from a multivariate normal distribution with the specified parameters.

Mardia’s Test for Multivariate Normality

Adds the following functions that test whether the data in range R1 is drawn from a multivariate normal distribution

MSKEWTEST(R1, lab): Mardia’s skewness test for multivariate normality; returns a column range with the values skewness, chi-square statistic, df and p-value, plus corrected statistic and p-value for small samples

MKURTTEST(R1, lab): Mardia’s kurtosis test for multivariate normality; returns a column range with the values kurtosis, z-statistic and p-value

If lab = TRUE then an extra column of labels is appended to the results (defaults to FALSE).

Changes to the Noncentral Chi-square and F  Distribution Functions

The NCHISQ_DIST, NCHISQ_INV, NCHISQ_NCP, NF_DIST, NF_INV and NF_NCP functions have been revised to improve their accuracy especially for large values of the noncentral parameter. Note that the number of terms in the infinite series approximations has been greatly expanded (default is now 1,000 instead of 80). In addition, you can now specify the level of accuracy via a precision argument (default is .000000001).

The same changes have been made to the power and sample size functions REG_POWER, REG_SIZE, CHISQ_POWERCHISQ_SIZE, ANOVA1_POWER, and ANOVA1_SIZE.

The various functions that support the noncentral t distribution and the corresponding power and sample size  functions have not changed, except that the default number of terms in the infinite series has been increased from 80 to 120.

These changes are also reflected in the corresponding Power and Sample Size data analysis tools.

Minor Enhancements and Bug Fixes

  • Fixes a bug in the KCORREL function
  • Added some new arguments to the Mahalanobis distance squared function MDISTSq
Posted in Announcement, New Release | 9 Comments

Release 4.7

I am pleased to announce Release 4.7 of the Real Statistics Resource Pack. The new release is now available for free download (Download Resource Pack) for Excel 2010, 2013 and 2016 (Windows version) environments. The Excel 2007 version will be available later today.

The spreadsheets for all the examples used on the Real Statistics website, including those related to the new Release 4.7 features) are available for free download (Download Examples Workbooks). These are contained in three Excel files (i.e. workbooks): Examples Workbook Part 1, Examples Workbook Part 2 and Multivariate Examples. See Workbook Examples for a description of which examples are contained in which files.

In addition, there is now a new Time Series Examples file which contains the time series examples. This file is still being revised as I add new webpages to the website related to time series analysis. This process is still ongoing and I hope to complete the addition of webpages related to Release 4.6 and 4.7 shortly.

Release 4.7 contains the following new features:

Enhancement to ARIMA data analysis tool

A new Reformat for Linear Regression option has been added to the ARIMA data analysis tool which can be used to generate the standard least squares model for AR(p) processes.

In addition, the following array function has been added where R1 is an n × 1 column range containing time series data:

ARMap(R1, p): outputs an np × p+1 range which contains X and Y data equivalent to the data in R1 in order to perform multiple linear regression

Minor changes and enhancements

The WEIBULL_INV function has been revised for consistency with the standard Excel functions WEIBULL and WEIBULL.DIST. In particular, the inverse of WEIBULL(x,b,a,TRUE) or WEIBULL.DIST(x,b,a,TRUE) is now WEIBULL_INV(p,b,a). Previously, this was expressed as WEIBULL_INV(p,a,b).

Previously the VIF value for the first variable in the output of the Linear Regression data analysis tool was always one. This has now been changed to its actual VIF value. The same change has been made to the formula VIF(R1,1).

Changes in handling array arguments

I am in the process of changing most of the Real Statistics functions which take a range as an argument so that they can also take an array as an argument. This is already true of most standard Excel functions. E.g. if R1 is a range then =SUM(R1) is the sum of all the elements in R1. You can also use expressions such as =SUM(R1^2) or SUM(ABS(R1)) to get the sum of the squared values in R1 or the sum of the absolute values in R1. Note that these are array formulas and so you need to press Ctrl-Shft-Enter instead of just Enter.

A notable exception are the functions RANK and RANK.AVG. A formula of form RANK(x,R1) calculates the rank of x among the data elements in R1, but expressions like RANK(ABS(R1)) yield an error.

Many Real Statistics functions already allow for array arguments, but over time I will be expanding the number that can take array values instead of just ranges.

Bonferroni and Dunn-Sidak corrections

When using contrasts as part of a post hoc analysis, you need to revise the alpha value by dividing by the number of contrasts that you will make. This is called the Bonferroni correction. With k groups there are C(k,2) possible comparisons, and so if you expect to make all these comparisons then you need to divide alpha by C(k,2). It turns out that for theoretical reasons, if you limit yourself to orthogonal contrasts, you will need to make at most k – 1 contrasts, and so you can use a corrected alpha value of α/(k-1).

In the ANOVA data analysis tools which includes a Bonferroni correction option, the corrected version of alpha will always be based on division by k-1. If you use fewer or more than this number of contrasts you will need to change the corrected value of alpha manually.

The Dunn-Sidak correction works similarly, except that this time the corrected value of alpha is alpha will be set by default to 1–(1–α)1/m. where m = the number of contrasts used. In the ANOVA data analysis tools which includes a Dunn-Sidak correction option, the corrected version of alpha will be set by default to 1–(1–α)1/m. where m = k -1 If you use fewer or more than this number of contrasts you will need to change the corrected value of alpha manually.

Note that this approach has now been applied to the Contrasts option of the One Factor Repeated Measures ANOVA data analysis tool.

Bug Fixes

  • Corrects a bug in the SCORREL and FKTEST function for data which contains a zero value.
  • Corrects some corrupted values in the DCRIT, LCRIT and RhoCRIT lookup functions.
  • Corrects an error in the calculation of the standard error for Dunn‘s post hoc test to the Kruskal-Wallis test.
  • Corrects an overflow error in calculating the sample size for Cronbach’s alpha (in CALPHA_SIZE and in the Cronbach’s Alpha data analysis tool).

Complex Numbers

Real Statistics has added new support for complex numbers. A complex number is represented as a range with one row and two columns, where the first cell contains the real part of the complex number and the second cell contains the imaginary part. The following functions are supported, where z1 and z2 are complex numbers:

  • CReal(z1) = real part of z1
  • CImag(z1) = imaginary part of z1
  • CAbs(z1) = absolute value of z1
  • CConj(z1) = conjugate of z1
  • CAdd(z1, z2) = z1 + z2
  • CSub(z1, z2) = z1 – z2
  • CMult(z1, z2) = z1 * z2
  • CDiv(z1, z2) = z1 / z2
  • CPower(z1, n) = z1n

 Complex Matrix Operations

We represent an m × n matrix of complex numbers by a m × 2n range in Excel. This range consists of an m × n range on the left and another m × n range on the right. The first of these ranges represents the real parts of the values of the complex numbers in the complex number matrix and the second of these ranges represents the imaginary parts of the complex numbers in the complex number matrix.

This is equivalent to representing a complex matrix as A + Bi where A and B both contain only real values.

The following array functions are supported, where Z1 and Z2 are complex matrices, z1 is a complex number and n is a positive integer:

  • ZAdd(Z1, Z2) = Z1 + Z2
  • ZSub(Z1, Z2) = Z1 – Z2
  • ZMult(Z1, Z2) = Z1 × Z2
  • ZMultScalar(Z1, z1) = z1 * Z1
  • ZInverse(Z1) = Z1-1
  • ZTranspose(Z1) = Z1T
  • ZIdentity(n) = n × n complex identity matrix (represented as an n × 2n range)
Posted in Announcement, New Release | Comments Off on Release 4.7

Rel 4.6 Installation

The installation instructions for the new release, Rel 4.6, of the Real Statistics Resource Pack is as follows:

1. Make sure that Excel’s Solver capability is installed on your computer. To check this proceed as follows:
(a) open up a blank Excel spreadsheet
(b) Press Alt-TI (i.e. hold down the Alt key and press the T and I keys).
(c) On the dialog box that appears make sure that the Solver addin is checked. If not, check it and click on the OK button.
(d) If the Solver addin was not checked you need to close the Excel file before proceeding to the next step.

2. Open a blank Excel spreadsheet
3. Press Alt-TI
4. Click on the Browse button on the dialog box that appears to locate the file containing Real Statistics that you downloaded from the website
5. Make sure that the Real Statistics addin is checked and click on the OK button

Charles

Posted in Announcement, Hint | Comments Off on Rel 4.6 Installation

Important Information about Rel 4.6

Unlike previous releases of the Real Statistics Resource Pack, the latest release, Release 4.6, uses Excel’s Solver capability. This means that you must make sure that the Solver addin is loaded on your computer

 You do this by pressing Alt-TI (i.e. hold the Alt key down and press T and I) and making sure that there is a check mark next to Solver on the dialog box that appears.

Charles

Posted in Hint | Comments Off on Important Information about Rel 4.6