Welcome to the *Real Statistics Using Excel *website. As described on the Home page, this website provides the resources (tutorials and statistics software) necessary to perform statistical analysis in the Excel environment.

In order to make this website more valuable to the user community, we welcome your comments, suggestions and feedback. We would especially like to hear about any errors in the website, examples or software. You can contact me as described below or by leaving a comment below or on any of the webpages.

Charles Zaiontz

**My email**: czaiontz@gmail.com, info@real-statistics.com**My profile**: click here

You have done an excellent job in putting together calculation of Statistics in Excel as most business folks especially in Finance uses excel as their main tool.

Muhammad,

Thank you very much for your kind words.

Charles

I’m very much happy to have seen your website on Real Statistics Using Excel. It helped me a lot in my statistical calculations.

Thank you for this wonderful site.

Cognizant to this, may I also inquire, if you have some notes on Sample Size determination (i.e. when and how to use it).

Thank you in advance.

Edward,

I am pleased that you have found the website useful. Throughout the site there is a quite a bit of information about sample size determination. Some examples are:

http://www.real-statistics.com/hypothesis-testing/statistical-power/

http://www.real-statistics.com/sampling-distributions/statistical-power-sample/

http://www.real-statistics.com/students-t-distribution/one-sample-t-test/ (Statistical Power section)

http://www.real-statistics.com/multiple-regression/multiple-regression-analysis/ (Figure 1)

http://www.real-statistics.com/multivariate-statistics/factor-analysis/validity-of-correlation-matrix-and-sample-size/ (Figure 9)

Charles

Congratulations Charles:

Real Statistics Using Excel it’s a really nice, practical and helpful website for learning statistics.

I’ll recommend your website with all my co-workers and classmates

Charles,

Great work!

I was wondering if you have or are considering to add multi-level models?

Michael,

Thank you. I am now considering what to do next. I expect to add multi-level models eventually. My next focus will likely be to add more information and/or software capabilities related to reliability, two factor ANOVA, ordered logistic regression and Probit.

Charles

Michael,

I am starting to look into these now.

Charles

This Excel add-in is so wonderful add-in. It makes my life a lot easier. I wonder whether you plan to add Cholesky Decomposition in the near future.

Cheers,

Charlie

Hi Charlie,

Thank you for your comment. I am glad that the add-in has been useful to you. As for Cholesky Decomposition I will explore the possibility of including it after I finish some other things that I am working on.

Charles

Charlie,

I have just added Cholesky Decomposition support to the Real Statistics Resource Pack. It is function CHOL in Release 2.13.

Charles

Dear Dr. Charles Zaiontz, thanks for providing such an informative website, so useful for statistics studens and professionals. Best wishes from Lisbon, Portugal.

I am attempting to create the Studentized Range q Table in Excel. What are the formulas I need to do this?

Thanks,

Rick

Rick,

I use a table for these values. I haven’t yet found a formula or algorithm which produces these values.

Charles

Charles,

A while back, I also was looking for an algorithm for finding the q critical value in Excel…but only had moderate success. Perhaps you already considered this reference. But, if not, and, you or an associate knows how to translate the c-language, perhaps this reference is useful.

http://svn.r-project.org/R/trunk/src/nmath/qtukey.c

Regards,

Rich

Thanks Rich,

I will take a look at the c-language code.

Charles

Because you indicate that you may look into this code, I see that there are a few other subfunctions used by the algorithm. If you had not found the p_tukey subfunction, here’s a reference…

http://svn.r-project.org/R/trunk/src/nmath/ptukey.c

Rich

Thanks Rich,

I did see that I needed the p_tukey function and found it. Thanks for your help and have a good Christmas/holiday season.

Charles

Have an enjoyable holiday season as well, Charles.

Found an additional algorithm link for studentized ranges with some p-value specificity. The link is one from a larger collection of algorithms at:

http://lib.stat.cmu.edu/apstat/

that you probably are aware of…

in particular, for q-related item,

http://lib.stat.cmu.edu/apstat/190

Regards, Rich

Rich,

I was never able to get any of these algorithms to work properly or with sufficient accuracy. I found yet a different source and this algorithm works very well (except where df = 1 and in a few cases). I will include the new functions QDIST and QINV in the next release of the software). Thanks for getting me going on this issue.

Charles

Nice package. Have you been able to fix the getting the macro to show on the ribbon problem yet?

Regards

Leigh,

I think that I have finally figured it out. I believe that InitStats doesn’t appear in the list of macros until it is run at least once from the Macros command. To solve this problem you need to select View > Macros|Macros and then enter InitStats and press Run. Once you have done this InitStats should appear in the list of macros. If so you can add InitStats to the ribbon.

Details now appear in http://www.real-statistics.com/excel-capabilities/supplemental-data-analysis-tools/accessing-supplemental-data-analysis-tools/

Try it and let me know if it works for you.

Charles

Really helpful website..keep up the good work

Charles

A stunning website, well done.

I have one suggestion for a minor addition: the Yates algorithm. It is so useful when analysing blocks.

regards

Ivor

Ivor,

Glad you like the website. I will look into the Yates algorithm.

Charles

Charles:

I cannot thank you enough for creating this site and sharing your knowledge. This was so very helpful for me during a statistics course which required the use of excel.

Thanks so much,

Kris from the US

Hi Charles,

I have observed that the boxplot option from the resource pack gives slightly different results compared to other tools from the internet, GraphPad or SPSS. More precisely it’s the lenght of the box (i.e., the IQR) which is different. Considering for example the data 45, 46, 49, 54, 55, 53, 47, 48, all other tools give an IQR of 7.5, whereas the resource pack gives 6.5. The reason is that the resource pack uses QUARTILE.INC instead of QUARTILE.EXC (I have verified this with EXCEL). QUARTILE.INC includes the minimum and the maximum values as the 0-th and the 4-th quartile, respectively, and so these values are taken out when calculating Q1 and Q3. Therefore, I consider QUARTILE.EXC more apropriate for IQR calculation.

All the best, Theo

Hi Theo,

It is interesting that you bring this issue up at this time. I am about to make the changes that you suggest. I plan to put these in the next release. Thanks for your suggestion.

Charles

Hi Charles,

Thanks a lot! You are really doing a fantastic job! With your Resource Pack Excel indeed becomes competitive to “professional” statistical packages. Many statistical tasks which can’t be done with EXCEL in a straightforward way (i.e., boxplots, but also many others) become available with your tools. This is extremely benificial, as EXCEL still is the most widely used data analysis tool in many companies and universities.

Thanks again and much success for the future!

Theo

Theo,

Thanks for your very kind remarks.

Charles

Hello Charles,

I have an additional question and a remark which possibly can help improve your tools. I am really very much interested in removing possible bugs, since I extensively use Excel for statistical purposes and when I found the Resource Pack on your website it was a real treasure trove.

The question: the 2-sample t-test for independent samples reports critical t-values, t-crit, for both the equal variance and the unequal variance case. However, only one value for the test statistic, t, namely that calculated for the unequal variance case, is reported. Wouldn’t it be more appropriate to report both t-values, as these are different for unequal sample sizes? For the following data (the numbers measuring some effect of a drug) I compared your tool with the corresponding standard Excel add-ins tool (confidence level 95%):

Drug A: 10.5, 7.8, 9.3, 8.7, 10.2, 8.9, 7.4, 9.3, 8.7, 7.9

Drug B: 11.2, 10.3, 9.2, 9.7, 8.9, 10.7, 9.9, 10.1

Resource Pack: test statistic t: 2.723;

Equal variance: t-crit (one): 1.746

Unequal Variance: t-crit (one): 1.753

Excel Add-in: Unequal variance: test statistic t: 2.723; t-crit (one): 1.746

Equal variance: test statistic t: 2.634; t-crit (one): 1.746

Since the test statistic for the equal variance case is different, I think it should be also reported.

A second slight differences between the standard Excel add-in and the Resource Pack is t-crit for the unequal variance case (1.746 versus 1.753). This difference results from the fact that within the Resource Pack the number of degrees of freedom used to calculate t-crit is obtained by truncating the value of the Welch–Satterthwaite equation [=T.INV(1-0.05;TRUNC(15.96))= 1.75305036], whereas the Excel Add-in does this by rounding [=T.INV(1-0.05;ROUND(15.96;0))= 1.74588368].

The remark, finally, concerns the one sample tests: the one sample t-test works well regardless whether the sample data are in a row or in a column, for the non-parametric alternative the sample data has to be in a column, otherwise the result is wrong (i.e., only one value from the sample data is considered resulting in count=1).

Kind regards & all the best

Theo

Hello Theo,

Thanks for persisting with this. You are right. I was convinced that I had calculated the t-stat correctly, but you are correct that there is a bug in the software. I will be correcting the software shortly. Thanks very much for catching these errors.

Charles

Hello Charles,

First of all, Thank you!! for all your work. It is great that you have decided to help us with statistics and have made available your add-in.

Now, the issues (sorry):

1- The add-in is checked but it does not show in Data.

2- I did a pre-post test and have to run a Revised measures anova (possibly manova), is it possible to do with Excel and your package? I have no local access to SPSS or other package.

Many thanks!

Hello Leslie,

Thanks for bringing up the two issues. I am pleased when people make suggestions or find errors. This only makes the website better. Regarding your two issues:

1. As you mentioned in your next comment, you can access the data analysis tools by pressing Ctrl-m. I would still like to make it available from one of the ribbons or quick access table, but so far I haven’t been able to figure out how to do it.

2. I am not sure what you mean by “Revised measures anova”. In any case The Real Statistics Resource Pack provides a Manova data analysis tool. You can get more information about Manove at http://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/.

Charles

Hi again Charles,

On issues:

1- I found the package raeding this section…ctrl m.

2- On the Manova, I need this repeated measures Manova because I have the same subjects taking pre and post-tests. Have you developped this test?

I may also need a two-factor Manova because I have two factors, age (in months, these are children (5-6 years) and group (contrôle and test).

Thanks.

Leslie,

I have developed Anova repeated measures capabilities, including pre and post tests. Although some of these may be useful for Manova follow-up testing, I have not as yet developed specific Manova repeated measures or two factor Manova capabilities yet. I plan to add these in the future.

Charles

Hello Charles,

Thanks a lot for correcting the above issues, I’m still very happy using your tools!

Nevertheless I would like to report some “nice-to-haves” and two bugs I found while using/testing the software. The bugs apparently affect only the Windows 8 /Excel 2013 implementation. Let me start with the “nice-to-haves”:

• Within the non-parametric methods the critical values for the test statistic (U-crit, T-crti, etc.) are not reported, only the value of the test static itself; although the tesing can be done using the p-values (which are, of course, reported), it would be nice to have also the critical values

• The significance levels for the Shapiro-Wilk test can’t be set, a default of 0.05 is used

And now the bugs:

• With Windows 8 /Excel 2013 there is a bug in the Single factor Anova when using contrasts with corrections: Dunn/Sidak gives a runtime error (A run time error has occurred. The analysis tool will be aborted; Invalid procedure call or argument); while the Bonferroni correction gives a wrong alpha-value (166,666,666,666,667)

• With Windows 8 /Excel 2013 after opening the tools and selecting a topic (Descriptive Statistics, T Tests, etc.) the window with the selected topic jumps behind the initial Resource Pack window and can be accessed only if one changes to another worksheet; an alternative is to move the initial Resource Pack window before selecting a topic from it; I’m not sure whether this is an Excel 2013 problem or it occurs only in combination Windows 8 /Excel 2013

Many thanks and kind regards,

Theo

Theo,

Thanks for your comment. I appreciate your suggestions and especially your finding any bugs. This helps improve the website and is useful for the whole community of users. Regarding your specific points:

– Critical values for non-parametric tests: The functions WCRIT, MCRIT, etc. give the critical values for small samples based on the tables of critical values. I suspect that what you want is different, namely to report the critical value for the specific test even for large samples where the normal approximation is used. I will add this capability to the software, probably in the next release.

– Setting significance level for Shapiro-Wilk: You just need to compare the p-value with whatever significance level you like. You can perform the Shapiro-Wilk test from the Descriptive Statistics data analysis tool. That output from that tool will show the Alpha value with the default of .05. You merely have to change the .05 to any value you want right on the worksheet to obtain the result you are requesting.

– Contrasts using a correction factor: I have just checked and, as you said, these errors don’t occur in Excel 2010. I don’t have a Windows 8 / Excel 2013 configuration currently, but will have access to one shortly. I will try to figure out what is happening. The software is really doing something very simple and so it is strange that this error is occurring.

– Dialog box for the chosen analysis moves behind the initial dialog box: I have done most of the testing in an Excel 2010 environment and so didn’t catch this problem when I did testing in Windows 8 / Excel 2013. I will try to figure out what is happening.

Charles

Dear Charles,

Thank you very much!

Kind regards,

Theo

Theo,

I have now gotten access to a computer that runs Windows 7 and Excel 2013. Both of the bugs that you referenced show up there as well, and so it looks like these are problems related to Excel 2013 (and not Windows 8). Unfortunately, I didn’t have time to resolve the problems yet, although I tried a few of the most obvious approaches for fixing the bugs, which didn’t work.

The computer that I used does not use English, and so numbers interchange commas and periods (e.g. one quarter is represented as 0,25 instead of 0.25). I thought that this may potentially be the source of the problem. How are decimals represented on your computer (using commas or periods)?

Charles

Hello Charles,

I’m using periods (.) for decimal representations.

Also, comming back to reporting critical values (T-crit, U-crit) for the nonparametrical tests: with Excel2013 /Windows 8 these are not reported at all (not even for small sample sizes).

Best regards,

Theo

Thanks Theo for your response. Very strange indeed. I need to figure out what is going on. Charles

Re: Repeated measures Anova-

Hi Charles,

Can you tell me where you deal with the repeated measures Anova, for one intervention and one control group, pre and post-test?

The paired-sample t-test takes care of the pre-post, but how to take simultaneously effect of the control-intervention issue?The research question is whether the mean change in the dependent variable from pre to post is different.

If your package covers this case, can you kindly indicate how to input the data?

Many thanks!

Hi Leslie,

Repeated Measures Anova is explained at the following webpage: http://www.real-statistics.com/anova-repeated-measures/

To access the data analysis tool for repeated measures Anova (assuming the software is installed on your computer), press Ctrl-m and select

Repeated Measures Anovaon the dialog box that appears.Charles

Hello Charles,

The last bugs I have reported (no reporting at all of critical values for non-parametric tests, problems when using either the Dunn/Sidak or Bonferroni correction) occur also with Windows7/Excel 2010.

I have installed the Resource Pack on 3 computers, all running with Windows7 64 bit OS and having Excel 2010 installed:

• One machine with 4 GB RAM runs under Windows 7 Home Premium

• A notebook with 4 GB RAM runs under Windows 7 Professional

• Finally, a workstation with 16 GB RAM running under Windows 7 Professional (my computer at work).

The problems reported occur on the first two machines while the Resource Pack runs properly on the third machine. Also the permissions on the three machines appear to be the same. The only striking difference is the RAM.

Kind regards,

Theo

Theo,

This is very strange. Since your original message I have tested the software on three computers all running Windows 7 with 4 GM RAM but with Excel 2007, 2010 and 2013. Only the last one had the problem.

I am going to make a small change in the next release (Rel 2.15) and see if it makes a difference. I expect to issue this release either tomorrow or Wednesday.

Charles

Theo,

I have tried to correct the Dunn/Sidak or Bonferroni problem in the latest release (Rel 2.15). Do you see any change?

Charles

Dear Charles,

I think I have found the reason for the bugs, but so far I have checked just for Excel 2010: as soon as on the Windows level the decimal separator is set to period (.) and the group separator to comma (,) the tools work well, even for the previous release. The problems occur only if in the OS settings this is the other way round, i.e., decimal separator as ” ,” and group seprator as ” .”. In this case the problems persist in a different way.

As soon as I’ll be back to work I will check it also for Excel 2013. I’ll send you a more detailed report of my test runs next week by email.

Kind regards,

Theo

Hi Charles,

I’ve forgotten a question: are the new functions (WPROB, MPROB, SRankPROB, functions for power & sample size, etc…) documented in more detail (synthax) anywhere on the website?

Theo

Theo,

All the new functions are documented on the webpage http://www.real-statistics.com/excel-capabilities/supplemental-functions/. Additional details can be found in various other webpages. Soon I plan to modify the http://www.real-statistics.com/excel-capabilities/supplemental-functions/ webpage so that you can click on any supplemental function and get additional information about that function.

Charles

Dr. Charles Zaiontz,

Your website and resources have been very helpful. I have taken econometrics in college but I am having some trouble. I have a large data set of customers with when they ordered products as well as the amount they ordered. I used basic excel linear regression to predict the amount of an order the model has a R squared of 92%. I used the same excel linear model to predict time between orders with a R squared of 95%. I want to predict if someone will order in the next X days as well as the amount they will order. What statistical package/ method do I use for this?

Any help would be much appreciated.

John,

You can create two separate regression models using EXCEL or other tools. If you want a single regression model with two dependent variables then you probably want multivariate regression.

R provides this capability.

Charles

Great work! Crisp and clear solutions to the example problems.

Do you have any plans to add support for regression trees or random forests ?

Kumar,

Thank you very much. I will add your ideas to the list of new potential enhancements. It is likely that these will be added at some future time. Stay tuned.

Charles

Dear Charles Zaiontz,

I have followed your instructions for installation but when I try to run your macor (ctrl-m) and choose an analysis, I get an #424 error message (under excel 2010) saying “object required”. When I tried to remove the macro, it requires a password.

Thanks for your help.

Sam

Sam,

It sounds like the Real Statistics Resource Pack was not installed properly. To find out for sure, enter the formula =VER() in any cell in an Excel worksheet. You should see the version number of the Real Statistics Resource Pack that you downloaded (e.g. 2.16.2). If you don’t see this, then the software is not recognized as an Excel add-in (which is what the installation instructions are intended to do). See the webpage http://www.real-statistics.com/free-download/real-statistics-resource-pack/ for more details. The key step is described in Figure 2 on that webpage.

Charles

Dear Charles,

Thanks for your response. I do get 2.16.2 when I check the version. However, when I launch the add-in either from the add-in menu or by ctrl-m I still get the error message.

Apparently, it is not possible to reinstall the add-in, since this requires a password.

Thanks for your advice,

Sam

Sam,

This seems strange to me. The add-in must be operational since you got 2.16.2. For some reason ctrl-m is generating an error message. I am using Excel 2010 on my computer without any problems, as well as many many other users.

You never need the password. This is only needed to modify the software and not to use the software. I suggest that you delete the software from your computer and start all over again. I plan to issue a new release of the software this week (Release 2.17). Hopefully the problem doesn’t reappear.

Another approach is to use the version of the software for Excel 2007, and see whether you get the same problem.

Charles

Charles

Mr. Zaiontz,

You have saved me so much time and energy with your real statistics add on package! Thank you so much for such an excellent tool! I am very grateful!

Dear Dr. Zaiontz,

I’m preparing a manuscript using analyses performed with your software. How should I cite Real Statistics?

Thanks

PS…Also, thank you for making this package available. This has saved me tons of time.

Dear Christopher,

I am very pleased that you found the software valuable.

You can refer to the website as Zaiontz, C. (2014) Real Statistics Using Excel, http://www.real-statistics.com

You can refer to the software as Real Statistics Resource Pack, Release 2.17, http://www.real-statistics.com (substituting the version of the released you used)

Charles

Thanks, Charles! Will do.

Dear Charles,

I most confess that I am a huge fan of your website, the work you have done it’s simply amazing. Thank you so much for making your knowledge available to everyone.

Greeting from México.

Thank you very much Antonio,

Charles

Zaiontz: your tools that you are sharing with the world are so wonderful! thanks again. i feel ungrateful for even asking . . . but are you considering adding some

clusteringalgorithms (k-means, &c) in the future? (that’d be so awesome!)Good to hear that you appreciate the tools. I do plan to add clustering algorithms in the future.

Charles

I will be teaching a statistics course in Spring 2015 and am looking for statistics software that is accessible for visually impaired folks who use text-reading software (such as Jaws). Jaws reads words available on an active window (such as options on menus) out loud for the user, and allows users to advance to and choose the desired option via keyboard shortcuts. I’ve been told that Excel is accessible but did not know if your tools were or not. Do you happen to know the answer to this?

I came across your web site in my search for possible alternatives to SPSS and SAS University Edition. We have found that SPSS is not fully accessible, and are in the process of testing SAS University Edition. It would be great to have an alternative to SAS, since I am teaching social work students who would find the task of learning the programming language an additional burden on top of learning statistics. Another advantage of your software is that students would still be able to use your Excel tools once they are graduated and do not have access to university computers. But of course your tools would have to be fully accessible for all students or I could not use them for the class. Thank you for your help.

Megan,

I don’t know whether my software is accessible using Jaws or other text-reading software. You can download and install my statistics software in a few minutes and test it out yourself. Since all the user-related mechanisms used in my software are standard Excel, there is good chance that it will work if Jaws works with Excel. Let me know whether it works and let me know whether I need to modify anything to get it to work.

Charles

Thanks, Charles.

I’ll have the tester at our school’s accessibility office test it out – it would be great if it would work! I’ll let you know.

Megan Petra

Charles,

Your software is almost (but not quite) accessible with Jaws. My understanding is that Jaws will read through options in a dialogue box, advancing from one thing to another via the tab key and choosing options via the space key.

Try tabbing through the software’s t-test dialogue box. You’ll notice that, when you first call up the t-test dialogue box, the cursor is automatically active in the box where you enter the cell range for the first variable. If you tab, however, it doesn’t take you to the entry box for the second variable. It eventually gets there, but you have to tab through the rest of the dialogue box first. If the tab-order were logical, however, I could tell my student exactly how to move through a dialogue box and ask for a t-test (or whatever statistical procedure I was teaching). Some of the dialogue boxes I checked had logical tab orders and some did not.

I think that is the only barrier to accessibility. I don’t know if changing the tabbing order in dialogue boxes is possible for you – but if it was, I think that might make Real Statistics accessible. Thanks for your consideration.

Megan Petra

Megan,

I would be happy to change the tab order. I have been fixing this of late, but apparently I have missed a few of the dialog boxes.

Can you help me by giving your opinion about the following:

1. If there are two input fields (with two Fill buttons) should the tab order be (a) Input 1, Fill 1, Input 2, Fill 2 or (b) Input 1, Input 2, Fill 1, Fill 2 or (c) Input 1, Input 2 with Fill 1 and Fill 2 much later in the tab order?

2. Should I generally use a tab order that goes from the top to the bottom? If so should I skip items (e.g. Alpha) that generally are not changed?

3. Should the OK, Cancel and Help buttons be the last items in the tab order?

Also please let me know of any dialog boxes that are in the correct order.

Charles

Megan,

I have revised the tab order on the dialog boxes in the latest release, Rel 3.2.2 (issued today). Please let me know if this order is acceptable.

Charles

Dear Charles,

Wouah impressed, yesterday I was looking desperately for some examples of multi reader multi cases Kappa statistics examples but not in SAS and not in SPSS software programing languages. And I found your fantastic website and the webpage:

http://www.real-statistics.com/reliability/fleiss-kappa/

Not a big deal but there are some wrong references in two formulas:

in case H9 for s.e.: it should be =B20*SQRT(SUMPRODUCT(B17:E17;(1-B17:E17))^2-SUMPRODUCT(B17:E17;1-B17:E17;1-2*B17:E17))/SUMPRODUCT(B17:E17;1-B17:E17)

in case B18 (it should be B19) for k1: it should be =1-SUMPRODUCT(B4:B15;$H$4-B4:B15)/($H$4*$H$5*($H$4-1)*B17*(1-B17))

Best regards

Cédric Marchessoux

Dear Cedric,

Thanks for catching this. I had changed the worksheet, but forgot to update the formulas in the table. I have now updated the webpage using the formulas that are actually used.

Charles

Charles,

Thank you! I’ll have my tester re-download the software and try again. In his most recent email to me he mentioned something about labels – maybe not all of the slots are labeled in a way that Jaws can read? I’m not sure. I’ll ask him to contact you to explain, if the problem still exists with the latest version.

Megan

Charles,

Thank you for your work in re-doing the tab order for dialog boxes. I have some new accessibility info from my tester. He looked at the T-test and nonparametric equivalents dialog box. My tester said that, in order for Jaws to be able to read items in the dialog box to him, they have to be labeled. Everything that has a radio button or a check box is labeled (including “column headings included with data” check-box, “one-sample,” “two paired samples,” and “two independent samples” radio buttons, and “non-parametric,” “use ties correction,” and “include exact test” check boxes). Probably the “t-test” check box is also labeled but my tester didn’t mention it at all. These radio buttons & check boxes are grouped into themes that you and I can see (“options,” “non-parametric test options,” and “test type”) but there is no such group labeling that Jaws can see. Finally, all fill boxes (input range 1 box, input range 2 box, alpha box, hyp mean/median box, output range box) are not labeled – that is, he can tell that a fill box is there and can tell what is pre-filled-in, but does not know what the fill boxes are for. He wasn’t able to tell me what you would need to do to create labels for the fill boxes – he doesn’t do that kind of programming – but he said that Real Statistics was very close to being accessible to text readers. I would say that getting labels on fill boxes is most important, whereas getting group labeling is probably less critical. I assume that dialog boxes for other statistical tests have the same pattern of labeling, but I do not know for sure since my tester only looked at the T-test dialog box. Thanks again for your work in making it possible for folks with visual impairments to use Real Statistics!

Megan,

I am a bit confused. Check boxes like “one-sample” in some sense have a label, namely the words “one-sample”. What qualifies as a label? Inside the program all the fields have a label.

Charles

I’m a little confused, too, as I don’t do this kind of programming. I think the check-boxes and radio buttons all have labels that JAWS can read aloud. Somehow the fill boxes where you can type in cell ranges do not have any labels that JAWS can read. There must be something different about how you get Excel to put a fill box into your customized dialog box (as opposed to a check-box or radio button), and how you label them. Alternately, it’s possible JAWS has a glitch… I don’t know. As always, thanks for taking a look at this.

Megan,

The check-boxes, radio buttons and command buttons (e.g. the OK button) all have “captions”. These are what you are calling “labels”. The text boxes and refedit controls are the “fill boxes” and these don’t have captions. All of these “controls” are reachable using the Tab key.

In addition to these controls is a control called a “label”. All the fill boxes have a label control to their left. These label controls do have a caption. While currently these label controls are not reachable via the Tab key, I can change this so that they are reachable via the Tab key. In this case when JAWS reads this caption out loud the next tabbed control will be one of the fill boxes (in fact this fill box can be identified by the caption of the label control).

Another possible alternative approach is for JAWS to read the “control tip” instead of the caption (or at least read the control tip for a control without a caption). The fill boxes all have control tips.

Charles

Dr. Charles,

This is a world-class web site. Even my associates in the Malcolm Baldrige community can’t argue with me.

Dr. Stephen Druley

I am confused. I submitted a question on goodness of fit a few days ago and it appears that the question was deleted with no response.

The question hasn’t been deleted, but I haven’t had time to respond yet.

Charles

By transforming the x values as (LN(x)^2.8) + 2.8

By transforming the y values as (LN(y))

By setting the target cell to the Pearson Product Moment=-.993

By using solver we can improve the Pearson to -.998148

By inventing two additional poker hands we can fill in the gaps

The result is a continuous poker rating equation that can be useful in a decision support process for players

Left out a step…

By setting the target cell to the square of the Pearson moment we eventually get a perfectly straight line with

R^2 = .999_

Charles

I would like to ask a question regarding data analysis. I am working on some data having 5 to 6 independent variables and 3 dependent variables. The data is panel with large n and t=10. The whole data is divided into two groups depending on the presence or absence of one of the independent variable in individuals. This division has made the groups unbalanced. Can you please suggest which data analysis tool would be appropriate in judging out the efficiency of the two groups (which of the two groups is better) and the relationship between dependent variables and independent variables.

Sorry, but I don’t understand your question.

Charles

I was telling that I am working on panel data having large n and t is 10 years. There are 6 independent variables and 3 dependent variables. One of the independent variable is discrete dummy variable. Like for example, whether a committee is present is a company or not is the observations. The observations found out that for some years, it was present and for other years it was not present. This was observed for n number of companies. Number of presence and absence of the committee is not same for all companies. Then I divided the whole data into two groups to find out whether the presence or absence of committee has any effect on my dependent variable or not. This has made both groups unbalanced, having different t for n. Individually I can run regression on both groups. But I wanted to know how can I compare two groups. Is there some better method or technique to find out the results.

Nidhi,

I have read your comment several times, but I don’t understand it. For one thing I don’t know what panel data is.

Charles

Hello

i am trying to make a mathematical model by Linear regression model and at end i calculated R value(regression coefficient).but i don’t know it is meaningful or not. tell me how i can know that it is a good regression coefficient?

Ali,

R is a correlation coefficient, and so takes values between -1 and 1. The closer R is to 1 the better that the linear regression model fits the data.

Charles

I am trying to do K-Means cluster analyses using Real Statistics. I am unclear how to go about formatting entering my data, and I can’t seem to find anything on the website describing how to use the “Cluster Analysis” data analysis tool. In any case, thanks for creating this Add-In.

Mike,

The reason that you haven’t seen anything on the website about cluster analysis is that I haven’t yet issued the cluster analysis capabilities for general release. I plan to do this shortly (Release 3.9). The fact that some of the cluster analysis capabilities were included in the Release 3.7/3.8 software has indeed caused some confusion. Sorry about this; I plan to explain how to use the cluster analysis tools sometime next week.

Charles

I tried posting this question, but it doesn’t seem to be showing up in the feed. My apologies if this ends up being a duplicate question. Anyway, I am having trouble with the Cluster Analysis data analysis tool. Specifically, I am unclear how to format my data, and I can’t seem to find any explanations on the website. Any chances there might be an explanation page added in the near future? Thanks!

Mike,

Sorry that you weren’t able to see your comment. This was because I lost my Internet connection this weekend and so hadn’t the opportunity to approve your comment for publication yet. Please see my response to your original comment.

Charles

Thank you so much for making a palatable statistics website. I am definitely sharing this with all of my coworkers.

Only issue with the software that I have encountered:

Calculating the sample size needed for a given power level in a 2 sample t-test.

All the fields are filled out properly and occasionally I get an “Overflow” error.

I am not professing to be an IT expert, so if this is an issue with my PC, then I will have it addressed by our internal IT folks.

Thank you again!

Elise

Elise,

Can you give me a specific example where you got the overflow message? For the 2 sample t test I need to know the effect size, power desired, # of tails, alpha, sample size ratio and sum count (if you changed it from the default of 40). With this information I can try to figure out what is causing the overflow and possibly fix it.

Charles

Dear Charles,

I can not thank you enough for putting this website and resource pack together. I work for an important international organization, where due to budget cuts, SPSS has unicorn like qualities. Being able to use these statistics in Excel has been a lifesaver for us and I have made it part of the induction kit for our department.

Yours truly,

A very greatly international civil servant!

Charles,

My sincere congratulations for the amazing work you are doing. It has been extremely helpful for me. The tools are amazing and the tutorial very clear and didatic.

Thank you a lot!

Pedro

Dear Charles,

Im back with a bug. In the meantime I’m using the tools for teaching and I continue enjoying them. And so do the students.

In the latest versions (e.g., 4.0, 4.1) the following problem occurs with the German Excel (2013): although, both at the OS (Windows) level and at the Excel level the decimal separator is set to dot (.) and the thousands separator to comma (,) on the GUI of the statistical tests the default alpha appears as 0,05. Accordingly, the test gives the correct results onyl if alpha is manually set to 0.05 (otherwise alpha is taken as 5)

I don’t know exactly when this bug occured first, however, in the 2.14.1 version it does not happen, but it happens in 3.6.1.

A “nice to have” would be for the “Descriptive Statistics and Normality” tool: here you have to select one input range. If you do the normality check, for the box plots and for Shapiro-Wilk the data is analyzed column-wise, i.e., for each column you get a test result and a box-plot, however for the QQ-plot the result is for the complete input range. It would be more appropriate to get the QQ-plots also for each column individually.

I also think that “Use exclusive version of quartile” should be the default setting.

Kind regards and thanks a lot for your great tools!

Theo

Theo,

Thanks for your support and for bringing these issues to my attention.

This problem with 0.05 versus 0,05 has plagued me from the beginning. In reaction to problems that people were having I changed the way I handled the default for alpha many months ago. This probably helped some people and hurt others. I still haven’t been able to come up with the right solution. In the next release I will start testing the alpha value and give an error message when 0,05 is viewed as 5, which will at least alert people to manually insert the correct value.

I added QQ-plots to the Descriptive Statistics and Normality tool a few releases ago, but was well aware that it was a short-term approach. I will eventually change the way QQ-plots are generated as you suggested, but there are always other things that seem to be more important to do.

I keep the inclusive version of quartile as the default simply because Excel 2007 doesn’t support the exclusive version.

Charles

Dear Charles,

Thank you for your prompt response.

Kind regards,

Theo

Dear Charles,

I just came across your website when I was searching for information regarding intra-rater reliability. I find your website very informative and useful.

Do you have any tools or recommendations on how to evaluate intra-rater reliability of One rater across a set of patients with multiple trials (around 6) done per patient?

Thanks a lot,

Praveen

Sorry, but I don’t know how to calculate intra-rater reliability with one rater and multiple trials. Perhaps someone else in the community has some suggestions.

Charles

I am re-learning statistics. My interest is real property valuation and analysis.

My question is, “Will your Excel add-ons work with LibreOffice spreadsheets?” LibreOffice is an open-source product that seems to be very compatible with the Microsoft Office programs (Word, Excel, and the like).

Thanks.

Fred Cull

Fred,

I have never used LibreOffice, but I doubt it will work since I don’t believe that LibreOffice supports VBA, which is the programming language that I used.

Charles

I will have to check this out. If not, then I will go with Excel. I realize that you cannot be all things to all programs.

Thanks.

Fred

Dear Sir

thank you very much for your support through your valuable web page. I have a doubt. i did multiple choice test with individual insect providing 4 choices and recorded the data on amount of time spent in each treatment and also number of entries in to each treatment. Now i wrote the paper, reviewers are asking me to do non parametric analysis. Will you please advice me on this. i did ANOVA, the comment is since it is violating ANOVA assumptions (independency of samples), i have to use non parametric tests etc. Sir, pl advice me which analysis i can do? thanks in advance.

If you performed one-way ANOVA with a fixed factor, then the usual nonparametric test is called Kruskal-Wallis.

If you violated the independence of samples assumption, you should simply use a different form of ANOVA, namely ANOVA with repeated measures. The usual nonparametric version of this test is called Friedman’s test.

Charles

Thank you very much for your reply. In that case should i use the data (amount of time spent in minutes, number of entries) directly or should i convert them in to ranks and then subject to Freidman test or Repeated measures of ANOVA. Pl advice!

Kamala,

If the data meets the assumption for ANOVA then use the data directly. If not Friedman’s test will convert the data to ranks.

Charles

thanks a lot Sir!

Your web page very useful appreciate it.

How should I site/refer your web page (I am using Harvard referencing style).

Thank you,

Based on the rules described on the wbepage http://is-uwa.v1.libguides.com/content.php?pid=43218&sid=328596, I suggest the following regarding Harvard referencing style:

Zaiontz, C. 2015 Real Statistics using Excel.

Statistical Analysis. Available from: < http://www.real-statistics.com>. [1 July 2015].The reference for a particular webpage is

Zaiontz, C. 2015 Title of webpage. Available from: < http://www.real-statistics.com/rest of URL>. [1 July 2015].

The reference for the software is

Real Statistics Resource Pack, computer software 2015. Available from: < http://www.real-statistics.com>. [1 July 2015].

Charles

Charles,

Is there a way to remove the ctrl+m shortcut? I frequently use this shortcut for other macros and don’t mind navigating to the add-ins page to access the Real Statistics menu. Thanks for the great tool!

Annie,

I’ve been looking for a way for you to disable the Ctrl-m shortcut for Real Statistics. Unfortunately, I haven’t found a simple way for you to do this.

Instead, I will provide such a capability in the next release of the software.

Charles

Annie,

This capability has been added to Release 4.4, which was released earlier today. See Blog for details.

Charles

Hi,

I have Microsoft Excel for Mac ver 15.11.2 from 2015 and have installed the Real Statistics Add-On for Mac but am not able to find Frequency Table in my data analysis tools. Where should I be looking??

Karen,

First make sure that the Real Statistics add-in has been installed properly by inserting the formula =VER() in any cell. If you get the version number of the software, then that is a good sign.

You won’t find the Real Statistic data analysis tools combined with the standard Excel data analysis tool. Instead, you can access the Real Statistics data analysis tools by entering Control-m (i.e. control key + m) or clicking on the Real Statistics menu.

Charles

Hi Charles! I’d like to ask. What statistical technique should I use if I would like to check if there is a significant difference on the effect of the treatment and control group if I only have three trials but with ten specimens per trial.

It really depends on what you mean by specimen and trial, and how you conduct the experiment. E.g., this could be a two factor ANOVA with factor A having two levels: Treatment and Control and factor B having three levels, one for each trial. This would mean a sample of 60 specimens (2*3*10).

Charles

I’m no pro in statistics and am looking for a simple way to calculate the sample size when using a cohen’s kappa.

Denise

Denise,

It really depends on how you have stored your data. Generally in Excel you calculate the sample size using the COUNT function.

Charles

Dear Sir,

I am busy with research for my PhD and have been asked by the University to test the objectivity of my research art effects. The idea is to make 6 short films and then have them tested by 4 peer reviewers. I believe Fleiss’ Kappa will be the best to do that, but I’m rather unsure how to set it up. I think that I will have the reviewers watch the films and then respond to a questionnaire with seven closed-ended, scaled check box type questions. I will appreciate your input on this.

Kind regards,

Willie Bouwer

Willie,

The following webpage describes how to perform Fleiss’ Kappa.

Fleiss’ Kappa

Charles

Hi Charles

I wanted to reference this page, could you let me know what year this page was created

Thanks

Hi Salma,

The referenced page was created in 2013. Other pages on the website were created in 2014 and 2015.

Charles

Hi

I installed the package 3 days back, everything was working fine. Suddenly there is some issue appearing MATRIX OPERATION – CORRELATION especialy when i am going to create matrix on new page. it is giving error as run time error and matrix is not created. However it is working if i select any particular cell in the same sheet

Hi,

I have just retested the Correlation option of the Matrix operation data analysis tool and it works fine.

If you send me an Excel file with your data, I will try to figure out what is going wrong.

Charles

Any way to get Odds Ratios from the excel linear regression program?

Jordan,

You can get the odds ratio for logistic regression, but I am not familiar with the use of the odds ratio for linear regression.

Charles

Dear Charles,

First of all, thank you very much for all those helpful calculations in Excel.

There is one statistical test for which I did not find any help. It is the Cochran`s C test for homogeneity of variances. Could you perhaps help me, how to do the test statistics in Excel?

Thanks a lot, Anke

Sorry Anke, but at present I don’t support Cochran’s C test since I find Levene’s test to be more useful. Cochran’s C test tests to see if the variance of one of the samples is significantly different from the variance of the other samples. Thus it really tests that the one variance is an outlier compared to the others.

Charles

I downloaded the EXCEL file Real Stat Using EXCEL for EXCEL 2007.

When running it gives error: Can’t Find Project OR Library.

Pl. advise

Prof S Chandrasekhar

Director Business Analytics

IFIM Business School B”lore INDIA

The most common reasons for getting this message are:

1. The software was not installed properly. To find out if this is the problem, press Alt-TI and see whether RealStats is on the list of add-ins with a check mark next to it. If you don’t see RealStats on the list, you need to press the Browse button and locate where you stored the Real Statistics software on your computer.

2. You are trying to open up the RealStats file (containing the software). You should not do this. Simply open any other Excel file.

3. There is some other problem. In this case, please insert the formula =VER() in any cell on a blank spreadsheet and let me know what value you get.

Charles

Charles,

Is it possible to use your functions with VBA code? Excel maths functions can be used with WorksheetFunction.Correl(x,y) for example. However, when I try it with your functions such as WorksheetFunction.durbin(x,y) it doesn’t work.

Anyway of getting your functions to work in the macro VB code?

Thanks,

Joseph

Joseph,

See the instructions on the webpage

Calling Real Statistics Functions in VBA

Also for Real Statistics functions, you should use durbin(x,y) instead of WorksheetFunction.durbin(x,y).

Charles

Hey Charles,

Just to notify you, I have taken a good look over your site and have sent you an email. If you would like I could post my question up here

Hamed,

Ok. I will take a look at your email.

Charles

Dear Charles,

Thank you so much, I can’t express he grateful i am, I just didn’t want to spam your page in the first place, looking forward to hearing from you soon! Thank you again

Hamed,

I will take a look at your email and get to you.

Charles

Dear Charles,

I wonder if you could help me on the followin:

” I have conducted a survey using likert scale – asked questions to a specific group, and received their feedback on my questions. My goal is to understand how perception of these group of people relates to my depepndent varible (DV). So, through Q1…..Q2 I would like to devise a conclusion/relationship to my DV. I am using EXCEL, and I believe, I should be using logistic regression as my data’s are ordinal.

I wanted to do the linear regression first ( I know, I should not be doing this), just to get familiar with the process. It requires two variables as an input to perform linear regression. As I only have Q1…..Q2, I am not sure how to figure out the second variable. In my head, Q1….Q2 are just questions.

Any help would be apprecited.

Dear Monjurul,

For linear regression you need at least one independent variable (IV) and a dependent variable (DV). This is also true of logistic regression (except that the dependent variable only takes two possible values). Do your questions provide some way of establishing a relationship between the IV and the DV?

Charles

Dear Charles,

I have sent you a mail – I wonder if you could look at it once time permit?

Thanks in advance

Monjurul,

I have looked at the document that you sent me, but it doesn’t really help me very much. You need to do something first.

Before deciding on which tests to use or how to run those tests, you need to decide what is the objective of your study and what hypotheses you want to prove or disprove. You don’t need to state these initially in statistical terms. In fact initially it may be better to state these in terms that are consistent with your domain of study/research (although these statements should be clear and potentially quantifiable).

Only you can do this part.

Charles

Dear Charles

Could I use K_S test on p value 0.01 ?

Thanks a lot

Sani

Sani,

Yes you can. See the following webpages for more information:

Kolmogorov-Smirnov Test for Normality

Kolmogorov-Smirnov Single Sample Test

Kolmogorov-Smirnov Table

Charles

Hello

I’ve just run the post hoc test (Games-Howell) along with a single factor ANOVA test, but I’ve got some odd results the post hoc test. I’m assuming that it’s because it’s not calculated the standard error, which is ultimately being driven by the values in the column labeled “C”, but I don’t know what this relates to, and is completely empty. Can you help, what should this contain? I’m not able to locate the formula for this test.

Hello,

You simply need to place the values +1 and -1 in the c column for two groups that you want to compare. These are the contrast coefficients. You can compare any two groups without increasing experimentwise error. This is explained on the website.

Charles

Thanks – can you point me to the place where this is explained on the website. Also is there a way of comparing all the days against each other (was looking to compare two days)

See Unplanned Comparisons

The following page may also be helpful

Planned comparisons

Charles

Pui-Shan,

See the following two webpages:

Planned Comparisons

Unplanned Comparisons

Charles

Hello Charles,

I am having trouble finding a relatively recent academic reference to support the assertion that ‘MANOVA is not very sensitive to violations of multivariate normality provided there aren’t any (or at least many) outliers’. I have a large sample (n=955) with no outliers, but the assumption of multivariate normality is violated. I can proceed with the analyses if I can cite evidence that the MANOVA will remain robust. Could you recommend a citation?

Kind regards,

Sharon

Sharon,

This is not to say that if the data is really skewed that you couldn’t have problems, but generally normality is not a problem. Some references are:

http://homepages.inf.ed.ac.uk/bwebb/statistics/MANOVA2.pdf

http://documents.software.dell.com/Statistics/Textbook/ANOVA-MANOVA#deviation

http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt

Charles

Charles

Sharon,

Here is one such reference.

http://www-bcf.usc.edu/~mmclaugh/550x/PPTslides/WeekElevenSlides/MANOVA.ppt

See slide 11.

Charles

I want to determine whether two sets of data (n = 12, for both) are statistically significantly different at P <=0.001 (two-tailed test). Should I run Mann-Whitney U test on the raw data, or should I first do a normality check and convert the data to a normal distribution? (say, by log, square-root, or exponential conversion).

Craig,

If the data is normally distributed then generally you should use the t test instead of Mann-Whitney. You should first check to see whether the data in each sample is normally distributed. If the departure from normality is not too severe then you can use the t test. Even relatively symmetric data should be ok. If the data is very skewed then you can use the Mann-Whitney test provided the distributions of the two samples are relatively similar (even if skewed).

Charles

Very helpful — thanks! Follow-up question: If the data ARE skewed (i.e., not normally distributed) and I use Mann-Whitney, would I use it on the raw data, or on the data that are transformed by whichever transformation brings their distribution closest to normality?

Prof,

I have an dependent variable and four independent variables.

To do regression analysis using excel 2013, should I select all the FOUR independent variable together under Input X range, or I should do regression of the dependent variable against each independent variable, thus in all, four times?

Thanks

Tom

Tom,

It all depends on what you want to accomplish, but I suggest that you perform the regression with all four independent variables and see which ones are not significant. These are candidates for elimination from the model.

Charles

Thanks for the free software, it is appreciated, from someone who enjoys statistics, and occasionally needs to do analysis

I recently consulted with a statistician that informed me, I need to use Generalized Estimating Equations (GEE) on the data that I’ve been gathering.

Does this excel add-in allow for GEE calculations?

Sorry Mike, but the Real Statistics website and software doesn’t yet support GEE. This will be supported shortly.

Charles

Hi, Dr. Zaiontz!

Have you given any thought to doing more with the Fast Fourier Transform (FFT) already provided in Excel?

They really have not explained what to do with it after they generate the complex form. I have managed to compute the resulting magnitude (amplitude) and frequency from the complex form given but need help on how to interpret what I have…presumably the frequency spectrum. Ok, so I locate the dominant frequency after I graph the magnitude and frequency….now what? How do I go about combining the various frequencies to get a better fit and how do I come up with the parameters for the various cos and sin terms? Is there a short cut without having to construct it individually for each frequency?

Puzzled?

Sorry Rod, but I haven’t tried to work with the Fast Fourier Transform capabilities in Excel.

Charles

Hi,

Do you know if its possible to do the exponential decay 3rd order in excel (y=yo+a*exp(-bx), fits for yo, a and b? I know I can do this in Origin or Sigmaplot, but it would make it easier if I could also integrate this on my excel sheet for the calculations I need.

Thanks!

Lis,

See the following webpage

Exponential Regression

Charles

Dear sir,

Let’s suppose that someone uses a questionnaire with questions like this “How much do you think that X can influnce your life?” and the participants have to answer by choosing a number from 1 to 10. Could median be a meaningful measure in a case like this? Or the researcher should just stick to the mode?

Thank you very much for your time.

Maria,

This all depends on the details of what you are trying to analyze. Usually the mean is used, sometimes the median, and only rarely is the mode used. But which you should use really depends on your specific research. It also depends on the test that you use. E.g. if you decide you want to analyze the data with the t test (assuming that this makes sense for your analysis and that the assumptions for the t test are met), then you probably want to focus on the mean. If it turns out that the assumptions for the t test are not met, then it would be common to use the median (and some nonparametric test) to do the analysis.

Charles

Let’s say that the researcher wants to explore the ideas of some people about how environmental problems influence their own lives. So, he has a questionnaire with e.g. 10 questions. Each question has the same form (“How much do you think that X can influnce your life?”) and the same response options. These options range from 1 to 10, where 1 means “no influence” and 10 means “great influence”. So, 1-10 are not “real” numbers; theyare just the participants’ way to express what they think.

In this case, when the researcher knows that e.g. 5 participants have ticked the option “1” and 10 ticked “2” …. etc., and does not plan to run any test at all, is there a real meaning for him to estimate the median? I mean, if he just wants to describe this kind of data, is median really applicable and meaningful?

Thank you very much once more.

Yes, the median can be useful as part of the description of the data.

Charles

Good day Dr

First of all I would like to express my appreciation for the invaluable real statistic package which have help a lot of people already.

However the subject of the matter is that I am carrying out a research on “The viability of crowdfunding to meet the funding needs of SMEs” and my main focus is on the supply side of funds, the investors. I am using a questionnaire to gather the data of the investors and the online questionnaire is on the following link https://docs.google.com/forms/d/1pibVVwzBqyrL4SQ9mcmNzCiArwzXp0VKWcy8J_vJe2Y/viewform

After gathering the data and test on the viability, i decided to consider whether the potential investors would chose to invest or not(the propensity to invest). This would constitute my dependent variable. The independent variables include, investment capacity, which is a function of income and investment pledge, expected return surplus, payback period, there are other variables that i have considered such as internet connectivity, social media presence, access to a payment system as variable of consideration of crowd-funding etc

I thought the logistic model was going to be most appropriate model to use considering the binary nature of most of my data and i chose it.

however i have been following your video that you shared on YouTube on logistic regression model, after coding my data. I did run the analysis but i got errors in excel about the data format.

May you please help in how i can address the error issue, where i should look at as well. I WOULD BE VERY, VERY HAPPY if you assist on the overall project by offering comments, insights and direction as well. I believe this project is of great importance to the SMEs in my home country and the nation as a whole and i would like it to be as informative as possible.

Accompanying this mail is the Excel sheet with the responses and the data that i am trying to analyse.

I am in much anticipation of your response

Desmond,

I have received your email and will send you my comments shortly.

Charles

Ok Doc will be be waiting for your response

I have now sent you a response to your email.

Charles

Hello Charles!

I’m trying to compare two population proportions to see if there is a significant difference between the two proportions. I’m reading everywhere that I have to use the z-score to do so, but I can’t seem to find any good information on the quickest way to do this in excel. My goal would be to have a p-value in the end, telling me if the difference between the two population proportions is significant.

Could you help me with this? Is there a page on your site on which this is explained?

Thank you so much! Your website is very helpful!

Isabel

Hello Isabel,

Yes, please look at the following webpage.

Proportion Distribution

Charles

Hi, I wanna say thank you for the add-in functions you provided for us!

However, today when I’m using function nchisq_dist(9629.82651550155,0.06365952,9414.72191791204,TRUE), it returns 0 which seems incorrect. I input the same number in matlab to verify it while matlab returns 0.8659. Since I don’t have the password for that add-in, so could u plz help me out?

Grant,

Yes, there are limitations to the use of the NCHISQ_DIST function that I have provided. In Excel CHISQ.DIST generates an error value when df < 1. Since NCHISQ_DIST uses the Excel function CHISQ.DIST, it is not surprising that it does not return the correct value when df < 1. There are also limitations when x or lambda are large. I will try to explain these limitations on the website. Thanks for identifying this problem. Charles

Hi,

I am doing a research about GPA’s at my school. In my data sheet I have 50 samples, under each label I have less than 30 samples. I tried to use “t-test: Two-Sample Assuming Unequal Variances” because I don’t have the exact variances but I also know that I cannot use t-test if I have more than 30 samples. Since I don’t know the population variances I cannot use the z-test, too. Which test should I use in this case?

Thank you in advance.

Nur,

Are you saying that you have 50 samples and each sample has fewer than 30 elements?

The following statement that you made “I also know that I cannot use t-test if I have more than 30 samples” is not true. You can use the t-test for samples with more than 30 elements as well as for samples with fewer than 30 elements. You also don’t need to know the variances to use this test. I suggest that you look at the following webpage, especially the subpages about hypothesis testing

t Distribution

Charles

I don’t have the ability to interpolate on my Excel (Mac version) and have the following values. My W is 0.984 ,n = 30 and my p-values in between 0.9 (.983) and 0.95 (.985). I don’t have the interpolating feature on Excel and saw the equation you had on that page was a little confused as to what went where to calculate it correctly. Some help would be much appreciated 🙂

Thanks!

Sarah,

I don’t know which page you are referring to. Shapiro-Wilk? In any case, please look at the following webpage

http://www.real-statistics.com/statistics-tables/interpolation/

Charles

I saw this page and that was my question in regards to the equation. Where you have .522 I wasn’t sure which of my variables went there in order to get that calculation to go.

Thanks

Sarah, sorry but I don’t know which webpage you are referring to when you say .522.

Charles

The equation thats on the interpolation page you posted the link for.

Sarah,

Thanks. Is everything clear now?

Charles

Hi Charles,

Appreciate your contributions to the statistics community.

I am trying to regress roughly 150 dependent variables (150 mutual fund’s time series returns) against roughly 8 independent variables (the time series of various factors like the return from simply buying cheap stocks, or stocks that have gone up recently, etc.). Thoughts on ways to do this without drinking heavily?

Drinking heavily seems like a pretty good option to me.

Charles

hi doc,

I must say am inspired and helped in the write up on statistics. I may say am unfamiliar with statistics and am doing a research which involves statistics..yes I have used simple excel now am finding difficult in analyzing the p-value …and do a scientic analysis. I have sent u my results of the p valus in your email. thanks in anticipation

Blaise,

I have responded to your email.

Charles

Dear Sir,

Currently I’m working on a schoolproject on forecasting the employee turnover of an organisation. To do this, I’m using a time series data of the employee turnover over the past 7 years. It is an annual data from 2009 to 2015. To make a good prediction it is also important to add some explanatory variable.

I want to use multiple regression with ARMA errors. But i’m not sure how to use these error to do some prediction. Do I need to make a prediction of each of the explanatory variable on it’s own first. Then add this predicted value in the model to do the prediction of the employee turnover?

Or will the model do alle the predictions of the dependent and independent variable and give me a predicted value of the dependent variable (turnover)?

Or do you recommend using another model to do the forecasting with all the explanatory variable?

I am really having a hard time putting all the variables in a model that gives me a good prediction. Also, the internal data has only a few observation, whereas the external data has more historical data? How can I combine these two data to do a multiple regression?

Is it statistically correct to use only 7 observations to do some forecasting?

Can u please help me out sir.

Thanks very much in advance

SB,

At present I only support time series for one variable. See Time Series Analysis.

I will eventually add multivariate time series capabilities.

Regarding your last question, you could make a forecast based on limited data; of course, the confidence interval of the prediction will be larger.

Charles

Hi, I just downloaded this Add-in today and I tried to use the one sample t-test (as described from your website). However, every time I enter the data range and the correct settings, I get this response when I click OK: “”A run time error has occurred. The analysis tool will be aborted. Unable to set the HorizontalAlignment property of the Range class””.

What does this mean and how do I fix it so I can use the one sample t-test function?

Thanks!

Jane,

If you send me an Excel file with your data I will try to figure out what is going wrong. You can get my email address from the Contact Us webpage.

Charles

I was looking for a non-parametric test equivalent to two-way ANOVA and thanks to your website I found how to conduct the Scheirer-Ray-Hare test.

So first, thanks a lot!!!

Second, I couldn’t find much information about this test, but I did find a comment saying it was not very robust… but which would be the alternatives, GLMs? I bet there are many cases out there where data with two factors do not follow normality, so I’m quite surprise not to find a clear pathway.

And third, if this test is perfectly valid, could I use any post-hoc, to check where are the differences within factors (e.g. my two factors are location and month, and the interaction is significant so I want to see in which months that happens). Any suggestions?

Thanks a lot in advance!

Just an additional question I had doubts of… If we have missing data, I see it didn’t work leaving the cell blank, or with NA, it only works introducing “0”. Is it considering it as a value?

Thanks again

Ara,

I don’t recommend using 0 for missing data. I suggest that you look at Handling Missing Data.

Charles

Ara,

It is true that Scheirer-Ray-Hare test is not very robust. You can use the various post-hoc tests, provided their assumptions are met.

Charles

Charles – great product that I use more and more – thanks so much for making and maintaining such a great product! I recently upgraded from Excel 2013 to Excel 2016, and now am having trouble making ‘Add-Ins’ appear on the ribbon. (Real Statistics is installed as an add-in and I can access it via Ctrl-m)

It’s not a big deal (since I can always use Ctrl-m), but I was wondering if this is just an issue with Excel 2016 or if I am not doing something right.

Thanks again,

Alvin

Alvin,

Sometimes the problem is the order in which the various addins need to be installed. I think the Analysis ToolPak needs to be installed first and then Solver and then Real Statistics. This may indeed be irrelevant to the problem you are having, but sometimes Excel is sensitive to these sorts of things.

Charles

HI Charles, loving your website by the way.

I’m working on some time series data and using the TIme Series Testing tool; my question regards the output from the ADF test. The answer is most probably written somewhere, but I can’t find it…. under stationary (yes/no), aic, bic, there is a lags box. My data at diff 1 (with trend) comes back as stationary with lags 6, does this mean the data is stationary after 6 lags? I.e. Do I need to difference the original data 6 times before I can use it for other tests and comparisons?

Thanks in advance, I think I’m having a dumb day!

Lisa,

Please look at the following webpage and see whether this helps.

Augmented Dickey-Fuller Test.

Charles

Mr. Zaiontz,

I would appreciate in case you’ll find time to answer my question. I’m analyzing a big number of point-sets(I would call them attributes). Each of these attribute is in XYZ format. I’m doing it to measure the evaluation of the data therefore I want to analyse it through the flow where the data are changed. So I’m generating these attributes (XYZ, 90 attributes each time) for several stages during the changing of data.

For each attribute I have statistics summarized with the following parameters: minimum, maximum, median, mean, standard deviation, skewnes, kurtosis. I would like to create a pdf function in excel based on these parameters as an input. So although I have the data points itself I don’t want to load them into excel (as it would be too many data) I want to use these extracted statistics to plot PDFs. Do you think it possible? Could you advise the workflow for this?

Ilya,

Sorry, but I don’t know how you would be able to create a pdf based on the minimum, maximum, median, mean, standard deviation, skewness and kurtosis. This would give you the first four moments of the distribution, but not the moments after that. I would imagine you could make an estimate of a possible pdf, but I’m not sure how you would do that. Perhaps some other reader has an idea for how to do this.

Charles

Ok, thanks for the reply) so I would have to use my datapoints I have but not a statistics from it. Maybe Kernel Density Estimation would be a solution in my case but anyway I would have to use my points as input data.

This could be done using Pearson distributions. See https://en.wikipedia.org/wiki/Pearson_distribution for more examples, and see the R package PearsonDS (https://cran.r-project.org/web/packages/PearsonDS/) for an example of how this has been implemented in R. It should in theory be possible for this to be re-implemented within the Real Statistics package.

Dear Charles,

I’d like to convey many thanks, once again, and many times more, if need be, for all your kind help.

Unfortunately, following brief use of Real stats in Win 8.1, French, Excel 2007, Data section would no longer open, and would insted CRASH Excel….

I can no longer install the add-in…. (be it 2003 or 2007 or some other one), in EITHER of the two files mentionned in other parts of this blog.

Ἔῤῥωσθε! == keep well, in good health (take care, etc…)

Georgios

I liked the site for the clarity, rigor and explanation of the process.

This is an excellent site for learning.

I wish there were examples showing potential pitfalls of using an “apparently obvious” method and/or hints in each chapter indicating, “What to Watch For”.

Anand,

That is a good idea. I will consider adding this some time in the future.

Charles

Hi, Charles.

Thanks again for this great tool.

Recently, when I open a new spreadsheet (also older files I’ve created using this Addin), the resource isn’t available because the macros have been disabled.

Is this something I did or an update I’m missing? I never use to have this problem. Do you have a fix? Like I said, I have older files that make use of the resource in a template that I hope to automate.

Thanks so much for your help.

Michael

Michael,

I am quite pleased that you like the tool.

I don’t know any reason why the macros would be disabled (assuming that you didn’t disable them).

If you have updated the Real Statistics software or moved the location of the software, then perhaps you need to tell an older file where to locate the add-in. You can do this exactly as described for the Real Statistics examples workbooks on the webpage:

http://www.real-statistics.com/free-download/examples-installation/

Charles

can we do knn in real stats software

What is knn?

Charles

k – Nearest Neighbors for missing value

Rahul,

The Real Statistics software doesn’t yet support the knn algorithm. It does support the k-means clustering algorithm and Jenks Natural Breaks.

Charles

so how to do missing value imputation ?

Rahul,

Various missing data approaches are described on the webpage

Handling Missing Data

Charles

Hello,

I am analyzing wind speed data. I want to find weibull distribution of wind data. I have hourly averaged wind speed data for one year. I draw a graph for (Speed) vs ( Percentage of time). How can I find Alpha and Beta values of weibull distribution function. Thank You,

Chaminda,

If you know that the data follows a Weibull distribution, you can find the alpha and beta values that produces the minimum squared error (MSE) using Solver. This approach is illustrated in a number of places on the website. Using Solver for Logistic Regression, Using Solver for Exponential Regression and using Solver for Time Series Analysis.

Charles

Charles

Would you so kind to send a comment, please?

(This is only because some Psychologists as J. Cohen and similar fauna, does insist in swap NHST from literature, which, IMO, is too radical . . .)

Thank you, so much

Luis

A reasonable meaning of H0: p=p0

a) In a real, probabilistic world, a no matter parameter when estimate through experimental/observational data, is by proper nature, errors caring, even the estimator is no biased,

b) Remembering, if so, the aim is to get sufficient evidence to falsify the Null Hypothesis,

c) Read, algebraically as an equation, H0: p=p0, leads directly to the absurd that there is no room for a difference even for a last decimal unit, and consequently the null hypothesis have not chance to be true.

Because what is intended to find out a significant result, i.e., if p is sufficiently far from p0 that, given the observed data, we can state that the opposite complementary hypothesis, Ha: p p0 is true. In short, I think that H0: p=p0 must be understood as indicating an approximate equality, not statistically discernible, given the data. The same thing for two parameter values H0: p1=p2.

Luis,

A lot of people have criticized the null hypothesis approach to statistical analysis, and with good reason. Other approaches have been proposed and are used (e.g. Bayesian approach), but the null hypothesis approach is still the one that is most commonly used and so I have adopted this approach for the website.

Charles

Thank you for helpful to our study, Charles.

When I used QCRIT(x,y,z) in the tools, a compile error on lookup was occurred.

On the Excel 2010 version 14.0.7166.5000 (old one of my PCs), it was no problem, but on the version 14.0.7166.5000 (the latest of my PCs, it’s used for my study), it caused a compatibility problem.

How can I fix the error on latest Excel?

Or, if you can, could you construct the compatible latest edition?

Thank you for reading.

Shiggy,

The two Excel version numbers that you provided are identical.

Which release of Real Statistics are you using?

What are the values of x, y and z?

Charles

Thank you for your reply, Charles

Sorry, Excel version numbers are 14.0.7172.5000 and 14.7166.5000.

I use Real Statistics that downloaded on Jun. 2016 for Excel version 14.0.7172.5000, and for Excel version 14.0.7166.5000, I use old one that downloaded on Nov. 2014.

The values are x:6, y:12, z:0.05 for example, I can get the same error under the Real Statistics that downloaded on Jun. 2016 and Excel version 14.0.7172.5000.

Shiggy

Shiggy,

I don’t know why you are having this problem. I used the formula =QCRIT(6,12,0.05) in the latest release (Rel 4.9 of July 19) and get the answer 4.75. What do you see when you use the formula =VER()

Charles

Charles

I try the old version add-in (3.2.2) that I used for past study to Excel version 14.0.7172.5000.

In this case, I can get a correct value 4.75 by the formula =QCRIT(6,12,0.05).

However, about the latest version (4.9), it seems to fail to read in Excel.

So, for this time, I will use the version 3.2.2.

Because this problem it may be proper problem due to the software constitution of my PC, I will ask to engineer of the our university about this problem.

Thank you so much, Charles.

Shiggy

Charles

I am fully acquainted, you know, a set of people at war with NHST. However, items like the one I point out, borns the ridiculous.

Such as [1]

“ Because the proposed Null Hypothesis H0: p=p0, is practically impossible to be attained, therefore the Null Hypothesis is always false”. “If there is no expectation regarding the possible truth of Null Hypothesis its falsification by data is a redundant nonsense “. “If the probability of a point hypothesis is indeterminate, the empirical discovery that discovery that such a hypothesis is false is no discovery at all, and thus adds noting to what is already known”.

No doubt a pearl of great price . . .

The “author” makes confusion between“point estimation” and the practical impossibility to have certainty at the real probabilistic world.

No matter . . . I guess that the fathers of NHST will be terrified to have found such argumentation.

I am just starting not worry about . . .

[1]- Null Hypothesis Significance Testing (On the Survival of a Flowed Method), Joachim Krueger (January 2001, American Psychologist, Vol. 56, No.16-26).

Luis

if there is very large variable like 35 how to decide which variable is important

which variable to include in model?

Rahul,

Are you speaking about regression?

Charles

yes

Gliner, Leech, Morgan [1] says as the major problem with NHST:

“ Kirk (1996) [2] went on to explain that was a trivial exercise because the Null Hypothesis is always false, and reject it is merely a matter of having enough power ”.

Our comment

The stated null hypothesis concerning a parameter, H0: p=p0 against Ha: p=p1 where p0 and p1 are real numbers, , even that the p0 exact value be practically impossible to be attained: I let out a laud laugh. When the real matter is to reject or fail to reject the null, I do not care at all this imprecision because with sufficient power (sample size) it always possible, given , to decide between the two hypothesis, so performing a rational choice. In fact the Confidence Interval (CI) once found is decisive: the test statistics inside, fail to reject H0, outside reject it. We know that a Type I error can occur, i.e. H0 is true, however we reject, or Type II, we fail to reject when H0 is untrue.

Luis

[1] – Problems With Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say? Jeffrey A. Gliner, Nancy L. Leech, George A. Morgan. The Journal of Experimental Education, 2002, 7(1), 83-92.

[2] – Practical significance: A concept whose time has come: Kirk R.E. Educational and Psychological Measurement. 56, 746-759.

Informative article

I do not care, at all, if the Null Hypothesis is true or not: the NHST

measures in what measure H0 is unlike. If so p<alpha we reject it.

if not we fail to reject. Never say that you accept H0. In effect, we are

completely unable to state such a thing.

Good morning Charles,

I am hoping that you might be able to help me out. I have a mac and successfully manged to download the add-in and get it to run. However, when I try to run my analysis, irrespective of whether it is a t-test of a repeated-measures ANOVA, I keep getting the same error message appear “compile error in hidden module: Analysis”. I have checked all of the fields and believe I have entered and selected everything correctly, but can not seem to get rid of this error. Any assistance would be very much appreciated!

Mel,

Are you able to use any of the functions? E.g. what do you see when you enter the formula VER() in any cell?

Charles

Hi Charles,

I get the same message too and I’m using a Mac as well. The version I’m using is excel 2011 and version 5.3.2 software.

Any assistance pls? Thanks!

Janine,

Which message are you referring to?

Charles

Hi Charles,

Thank you for these excel-lent (pun half-intended) stats tools. They work fine for me under MacOS 10 Yosemite apart from one major snag: execution of the functions take a very long time! Like half a minute or so. I seem to remember having similar problems with my own VBA macros in Excel until I put these code lines in at the start of every function:

Application.EnableEvents = False

Application.Calculation = xlCalculationManual

and then ending with reversing them:

Application.Calculation = xlCalculationAutomatic

Application.EnableEvents = True

Is there any way I can make the execution work faster without putting these lines into all the macros of RealStats? I guess there is something with the prefs of my Excel that brings this problem about, and that others don’t experience it. I would be very grateful for your comments. (btw, the RealStats VBA code i password-protected – do you distribute the password?

Dan B,

You won’t be able to make these changes yourself. I don’t distribute the password.

MY experience in the past was that the execution was relatively fast. Which function was this slow?

Charles

Hi Charles,

Thanks for the prompt reply. Since I posted my question, I have experimented with Excel preferences, and it works like this: if “Calculation” prefs are set to “Automatic”, then execution of the RealStats function “EVECTORS()” takes like 30 s. However, if I change the Excel Calculation prefs to “Manual”, I have to prompt the execution of all calculations using “Calc Now” or “Calc Sheet”, but the actual calculation is quite fast. I don’t know why this happens specifically to me (and presumably not others, since I imagine that most people have their prefs set to automatic calculation).

Dan,

Strange indeed. Since eVECTORS uses an iterative technique it could take some time to calculate for a large matrix, but in my experience to date, it is still pretty fast.

Charles

I don’t think it is specific to evectors, or even to RealStats. As I said, my own VBA macros behaved the same way, but there I could remedy it by putting in the code lines I mentioned earlier. So it’s probably not really a RealStats issue. I rather hoped you would know the solution. I guess I will have to search for the answer somewhere else or make do with manual calc.

Bringing to light towards dismantling it, and it is our duty to do so…

Douglas H. Johnson

The Insignificance of Statistical Significance Testing (1999)

http://digitalcommons.unl.edu/usgsnpwTC/225

1. Are Null Hypothesis Really True?

“Most null hypotheses tested, however, state that some parameter equals zero, or some set of parameters are all equal. These hypotheses, called point null hypotheses, are almost invariably known to be false before any data are collected (Berkson 1938, Savage 1957, Johnson 1995)” [citation ended]

My comment

The parameter could very well be equal to zero and almost every time the observed value issued from data is different from zero: it is randomness acting. The A. is wrong, in fact we are completely unable by observation/experiment to state the true value of a parameter whatever. Using calculated Confidence Intervals for increasing sample sizes we obtain, progressively narrow intervals till a sufficiently low imprecision for practical intent is reached, in case of data homogeneity, and the test is strictly exact. On contrary with approximate test statistics it could very well happen that this final interval does not contain the true parameter value

Note: I do not care at all though the Null is true or untrue. My intent is to choose judiciously between it and the alternative. From the test value I can discard H0 with sufficient confidence or alternatively fail to do so when p-value is larger than alpha, a value I chose to wrongly reject a really true parameter value.

Luis A. Afonso

Hi again Charles,

Speaking of eVECTORS: the output of this array fn is a range of values, the first one being the eigenvalue and the rest being the coordinates of the (unit) eigenvector. Each time I use this function, I notice that the eigenvectors are listed in descending order with respect to the value of their corresponding eigenvalues. Is that a built-in feature of eVECTORS, such that I can trust that the output always will be organised that way (ie the highest eigenvalue first)? If so, that would simplify my calculations considerably.

Hi Dan,

The third parameter of the eVECTORS function describes the order in which the eigenvalues are listed.

If order is TRUE or omitted then the eigenvalues are listed in order from highest in absolute value to smallest. If order is FALSE then they are listed in order from highest to lowest.

Charles

Thanks Charles. That is great, since I won’t have to sort or rank the eigenvalues/vectors in a separate formula.

Hi Charles;

I’m getting an error message in trying to send you an email through your published addresses. I’m in need of guidance. I’m hoping you can help. I have a sample of passengers boarding a vehicle and how far they traveled. The samples were collected every 6th day for a year. I need to be able to prove (disprove) that these samples meet a minimum of 95% confidence; and minimum precision level of +/- 10%. How would I go about proving this.? There are so many tests, I get confused with them. Thank you for your help!

Catherine,

I am sorry, but I am sure what you mean by “samples meet a minimum of 95% confidence” or have a “minimum precision level of +/- 10%”.

Perhaps you mean that the population mean is the calculated sample mean with 95% confidence (?)

Charles

Can you reply to my email address and I can give you more detail?

Yes, I can try.

Charles

Going to send you an email from my personal account since our company email is migrating to the new microsoft 360 and experiencing many issues.

Dear Charles,

I am carrying out a research and want to analyse the effect of some meteorological parameters (temperature, relative humidity, evaporation, wind speed..) on river flow.

Can you kindly assist me and suggest the best statistical method for such an analysis.

Thank you

You need to provide more detail about the type of analysis that you want to do.

Charles

Dear Dr. Zaiontz,

I just wanted to say thank you for sharing so much of your knowledge on your website. I am a math/stat major, and I have repeatedly returned to your website to get fuller, more intuitive explanations of the concepts in my stat textbooks, which usually throw formula’s at you without much explanation of their derivation.

A thousand thanks, Dr. Zaiontz!

Best regards,

James

James,

Thanks for your comment. I am very pleased that I was able to help you.

Charles

I would like to use the program, but I have the 64 bit version. Is there a way to use the program anyway?

Mareike,

I have heard mixed things about whether the software works with the 64 bit version of Excel. I would try to use it and see whether it works or not. Please let me know what you find.

Charles

It opens only the file realstats.xlam, but no installation of the software. So I think it won`t work…

My fault, it works! And it`s a really great tool. Thank you!

Dr. Zaiontz,

Thank you. Your website is wonderful, as well as the Excel add-in. I believe I have correctly run a logistic regression on my relatively small data set (n ~140). One input, one output, pretest ~ course success (pass/fail).

My question is: Can you recommend how I can find someone to double check my interpretation of the output? I did read your website and posted a question and if I interpreted your data set correctly, then I think I am doing my interpretation correct.

Are there statisticians who will check my work affordably?

Amy,

I don-t have anyone to recommend. Perhaps someone reading your comment will have someone to recommend.

In the meantime, I will respond to your other comment shortly.

Charles

Thanks. I await your response. I think I got it, but just want to make sure.

Hello,

Would you please help me figure out the appropriate test. I have results from satisfaction surveys for 2 different companies. 40 surveys per company. I want to find out if the satisfaction is the same or different. Would I run a two sample t test with equal variances? Thanks for any guidance you can give me.

Lu,

You can run a two sample t test. If the variances of the two samples are relatively similar you can use the equal variances version of the test; otherwise you should use the unequal variances version of the test. You should also make sure that the assumptions for the t test are met; otherwise you might use a nonparametric test such as Mann-Whitney. See t Test for details.

Charles

Goodmorning Prof,

Am Jude,i want to do my thesis on generalized approach to the Analysis of Variance.Please sir,can you direct me on how to commence going about it?Thanks

Jude,

That is quite a broad request. For starters, please see the following webpage.

Analysis of Variance

Charles

Hi Charles,

First of all, I would like to thank you for this wonderful website and codes. I started learning econometrics and your help if invaluable.

I have several case studies in the form of panel data. I was wondering if we can apply your logistic regression function to panel data?

Thanks for your reply.

Arnaud

Arnaud,

I am very pleased that you value the website and software.

I have not yet delved into panel data on the website and software. I hope to get to that in the future. The logistic regression capabilities described on the website and implemented in the software are only the beginnings of the approaches used with panel data.

Stay tuned, more to come.

Charles

Hi Charles

I did obtain the Quantiles of the Geary Test (goodness of fit) concerning

Normal Data.

Is it possible to display the following Table in order to have feedback from

the Readers, to confirm that the values are acceptable for practical purposes?

1E7 simulated samples for each sample size.

Thank you.

0.80 0.85 0.90 0.95 0.98 0.99

n

20 .847 .855 .864 .878 .892 .901

25 .841 .848 .857 .869 .883 .891

30 .837 .813 .852 .863 .876 .884

35 .834 .840 .847 .858 .870 .878

40 .831 .837 .844 .854 .866 .873

45 .829 .834 .841 .851 .862 .869

50 .827 .832 .039 .848 .859 .865

55 .826 .831 .837 .846 .856 .862

60 .824 .829 .835 .844 .853 .860

65 .823 .828 .834 .842 .851 .857

70 .822 .827 .832 .840 .849 .855

75 .821 .826 .831 .839 .848 .853

80 .820 .825 .830 .838 .846 .852

90 .819 .823 .828 .835 .843 .849

100 .818 .822 .826 .833 .841 .846

120 .816 .819 .824 .830 .837 .842

140 .814 .818 .822 .828 .834 .839

160 .813 .816 .820 .826 .832 .836

180 .812 .815 .819 .824 .830 .834

200 .812 .814 .818 .823 .828 .832

Luis,

Ok, it is posted. Let’s see what feedback you get.

Charles

Hi, Charles

Monte Carlo Method (MCM) and me

Since more than 4 decades that personal computers (PC) are currently available to everyone, I started to be concerned in its usefulness in Statistical matters. Because they aimed to control depending on the chance procedures and also to educationally interiorize randomness, MCM is of undisputable value in practical and learning issues. Here is an example.

We know from Theory that the extreme values, a matter of importance in several branches as Climate Changes, for example, are, for the Uniform (0,1) Distribution:

P(min≤x)=1-(1-x)^n

P(max>x)=1-x^n

Follows the results for samples sizes 20,30, 50, 1 million samples per size:

___n=20__/1´000´000

___min.___max_________observed__ (theory)_

__0.005__0.995_____0.0953__0.0952 (.0954)_

__0.050__0.950_____0.6416__0.6412 (.6415)_

__0.100__0.900_____0.8788__0.8786 (.8784)_

__0.200__0.800_____0.9884__0.9883 (.9885)_

__0.300__0.700_____0.9992__0.9992 (.9992)_

___n=30__/1´000´000

___min.___max_________observed__(theory)_

__0.001__0.999_____0.0296__0.0296 (.0296)_

__0.005__0.995_____0.1397__0.1397 (.1396)_

__0.050__0.950_____0.7854__0.7854 (.7854)_

__0.100__0.900_____0.9574__0.9578 (.9576)_

__0.200__0.800_____0.9987__0.9988 (.9988)_

___n=50__/1´000´000

___min.___max_________observed__(theory)_

__0.001__0.999_____0.0488__0.0488 (0.0488)_

__0.005__0.995_____0.2217__0.2217 (0.2217)_

__0.050__0.950_____0.9231__0.9229 (0.9231)_

__0.100__0.900_____0.9948__0.9950 (0.9948)_

__0.200__0.800_____1.0000__1.0000 (1.0000)_

Therefore, we reach a two-fold conclusion:

The Theory is correct as everybody knows,

And my RNG is quite acceptable quality. which was of course the true intent.

Luis

This is a particular e-mail

Charles

Now can I conciliate the Abdi & Molin

D(1000,.95)=0.0283 with my evaluation using 400´000 samples /size

n= 250, 0.0722

500, 0.0727

750, 0.0737

1000, 0.0742

One of us is deadly wrong, Abdi or me, for sure.

Do you agree?

Luis

Luis,

I agree, but it would be good to understand what are the right values to use.

Charles

Charles

That´s the point . . .

How can I be judge and jury my own sake?

It was precisely the reason I invited you to recalculate the Critical

Values in order that you be sure who is right, Abdy or me.

Never mind, remember “Nemo index in causa sua”

Science always does depend on cross-validation, without it, crashes. . .

Luis

Charles says:

January 29, 2017 at 11:54 am

Sure, it would be fine . . .

What you propose people should do in this instance?

I think that more people are needed to surpass the deadlock.

Do you think so, Charles?

luis

Charles

The Table above is displayed concerning the Abdi & Morin CV´s compared with those issued from my own routine (see ANNEXE)

Abdi & Morin 1E5 samples/size

.01

.05

Samples/

/ size

MY A&M

n .01 .05

50 .1450 .1245 4E6 .1457 .1246

100 .1036 .0889 .1026 .0888

150 .0850 .0729 .0840 .0727

200 .0738 .0633 .0729 .0630

250 .0662 .0567 1E6 .0652 .0564

300 .0605 .0519 4E5 .0596 .0515

400 .0525 .0450 .0516 .0447

500 .0471 .0404 .0462 .0400

Remember that it was proposed

f_(n )=(.83+n)/√n-.01

CV(.05)=0.895/f_n CV(.01)=1.035/f_n

[1] Hervé. Abdi, Paul .Molin, Lilliefors/Van Soest´s test of normality

_____________

The other confidence levels (20%, …) are completely useless, nobody uses that,

no guaranty in NHST.

__________________________

Conclusion

Owing to an error in the expression that transforms Z into p-value I wrongly assumed that the Abdi & Morin fitting expressions, never those evaluated by Monte Carlo, was strongly sub-evaluated. After revising my routine I could verify that the error was mine as it could be verified by the Table where my VC´s values were recalculated

I hope my apologies would be accepted.

Luis

dear sir,

i’ve a doubt about the conclusions of one sample wilcoxon test. after applying the test on my samples for significance level of 5%, i obtain the p vlaue 0.45 and corresponding z score is 0.123. If i’m not wrong, on the basis of obtained p value, there are very less chances to reject null hypothesis (kindly correct me if i’m wrong at any point). what else information i can conclude from my these results.

with regards

Nirbhow,

If p = .45, then you should not reject the null hypothesis.

Charles

dear sir,

thanks a lot for responding.

i want to further inquire from you that on the basis of p value and z score what conclusion i can draw about the nature of sample data i’ve used for test/ nature of technique i’ve used to obtain the sample data

with reards

That p-value = 0 means that it is highly unlikely that the real value for kappa is zero.

Charles

obtaining p value= 0.45, is a normal case or not. i’m asking this question because the magnitude of p value is high.

and what kind of information z score tells about the sample data

Nirbhow,

In what context are you referring to a z score?

Charles

respected sir,

i’m have applied one sample wilcoxon’s test to my samples and obtain p vlaue 0.45 and corresponding z score is 0.123. What is the qualitative information that this z score provides about my sample data.

The z score is used to calculate the p-value and effect size.

Charles

I have just discovered your wonderful site. You have my eternal thanks and appreciation.

Thank you Leah,

Charles

thank you! i love your support! 🙂

I believe there is an error on your page explaining how to execute Holt-Winters method in Excel. For “Example 1” you say to put “=E$22*(C9/D9)+(1-E$22)*F5 in cell F8” however I believe it should be “=E$22*(C8/D8)+(1-E$22)*F4 in cell F8”.

Thank you for this site, the information is extremely helpful!!

Breadcrumbs for ref: /time-series-analysis/basic-time-series-forecasting/holt-winters-method/

Doug,

Thanks for catching this error. I have now made the change on the referenced webpage.

I really appreciate your help in making the website better.

Charles

Dear Sir,

I love this website so much. I do not have a statistics background but I am trying to learn. I am currently doing a research that needs statistics knowledge. Could you give me some advice on my research below?

I am trying to use multiple regression method to explore impacts on housing price in a city. The city has a lake and has a smelly river (these two waterbodies are not adjacent to each other). The research purpose is to gauge housing value increase given that the smelly river been converted to a lake by building a barrage.

I have got housing transaction data that occurred in the area within 1000m buffer from the river and the lake, from 2014 to 2017 of around 750 records. In the database columns refer to different attributes of transactions. The attributes may include ‘property sale price’, ‘name of suburb’, ‘sale date’, ‘land area’, ‘bedroom number’, etc.

By converting the ‘name of suburb’, I can get another two attributes: 1.the proximate distance of the property to CBD ‘Dist_ CBD’; and 2. whether or not the property is located besides the lake or the smelly river ‘lake_nolake’.

Now in the regression model, Y is per sqm housing transaction price. Xs are,

‘Dist_ CBD’ (I measured by meter);

‘lake_nolake’ (if the property is lake-side, I code it 1, if the property is river-side, I code it 0)

‘sale date’ (I converted the date to number);

‘land area’ (I keep the original unit of sqm);

‘bedroom number’ (I keep the original unit of number);

Then I ran the regression model.

Regression Analysis

OVERALL FIT

Multiple R 0.434978 AIC 7900.829

R Square 0.189206 AICc 7900.882

Adjusted R Square 0.187063 SBC 7914.729

Standard Error 180.5385

Observations 760

ANOVA Alpha 0.05

df SS MS F p-value sig

Regression 2 5757830 2878915 88.32612 3.33E-35 yes

Residual 757 24673773 32594.15

Total 759 30431603

coeff std err t stat p-value lower upper vif

Intercept 497.0258 11.62045 42.77164 3.7E-204 474.2137 519.838

Dist_CBD -20.5369 1.712126 -11.995 1.8E-30 -23.898 -17.1758 1.237728

Lake_NoLake 3.5051 31.53518 0.111149 0.911528 -58.4017 65.4119 1.237728

Then the question would be:

How to interpret the ‘lake-nolake’ coefficient of ‘3.5051’? Can I say the per sqm price of lake-side properties tends to be higher than that of river-side properties by $3.5051?

Thank you sir.

Bo,

Based on the regression model that you have created, you are correct in saying that the model predicts that the per sqm price of lake-side properties is $3.5051 higher than river-side properties.

Charles

Thank you very much for your prompt reply.

The R square of the model is only 0.189 which would mean the model can only explain small part of the situation.

May I ask how to convince the local residence the reliability of the $3.5051?

Bo,

I don’t understand your question.

Charles

Sorry for my unclear question. I would like to show you the conflict results between multiple regression and single regression using same database and wish to seek your advice.

In my comment on March 23, 2017, we can say that the per sqm price of lake-side properties tend to be $3.5051 higher than river-side properties.

However, if I run a single regression using the same data (y=per sqm housing transaction price, x=lake_nolake), the regression result is:

y=162.27x+383.5 R square= 0.0351 which is low. This single regression result may tell that the per sqm price of lake-side properties tend to be $162 lower than river-side properties.

Thank you for your help.

Hi Charles,

Please give me your thoughts on the following.

After studying further onto the previous matter, I came to the conclusion that a one-way ANOVA using contrasts for planned comparisons would be appropriate. That way I can identify which groups are significantly different from the control group.

My problem is how can I identify the proper materials, since I’m interested in the materials that are Not significantly different from the original.

The original material has a mean of 7.5 but the minimum observed is 7.2 and the maximum 7.7.

If I use a standard alpha of 0.05 or 0.01, even with the Dunn/Sidak or Bonferroni corrections some materials with a mean of 7.6 are being considered significantly different when it is known that they shouldn’t.

If I do a one-sample T-test on the control material, and use the p-value obtained as my alpha I get much more reliable results that correlate with the observations of materials that are comparable to the control and the ones that are not.

With this method my alpha is extremely small (3.73446E-51), but it still identifies the materials that are actually different, and more importantly it does not make everything significantly different as was the case when using the standard p-values.

Please give me your thoughts on if this would be an acceptable practice.

Thanks,

Sergio

Sergio,

You may need to use Dunnett’s Test. See my response to your email for more details.

Charles

Dear Sir

I have started to reading this article. I think it may be has instead of have. I really don’t know whether it’s have or has or having.

”Four fields are available for testing with each field have fairly uniform characteristics”

Sincerely

Sujit

Sujit,

It should be “having”. Thanks for identifying this error. I have now corrected the webpage.

I appreciate your help in improving the website.

Charles

Is there any video tutorial available on performing ADF test on excel using realstat pack ?

Mayank,

I don’t know of such a video. Do you have some questions about how to perform the ADF test using the RealStats software?

Charles

I have a SB coefficient of 0.502 and a guttman split-half coefficient of 0.498. Does it means my data has a low reliability?

Cedric,

There isn’t universal agreement about what is high and low, but most people would say that this is not sufficiently high reliability.

Charles

Good evening Dr. Charles.

I’m an undergraduate student from a university in Indonesia. If you don’t mind, I would like to ask a question about normal distribution.

If I have data of a pre-test, an immediate post-test, and a delayed post-test, do all my tests have to be normally distributed? Or is it just my pre-test which needs to be normally distributed?

I would be really glad if you would answer my simple question.

Thank you in advance.

Steve

Steve,

Assuming that you are referring to a repeated measures Anova, then you need make sure that the pre-test, post-test and delayed post-test are all normally distributed. Of course, Anova is pretty robust to violations of this assumption, especially if each group is reasonably symmetric.

Charles

Thank you very much for your reply.

Sorry for not mentioning it before.

I’m actually running a univariate ANCOVA test, with the pre-test as a covariate. Then, I guess I also need normal distribution for all my tests?

If it is robust to violation of the assumption of normality, does it mean I can use the ANCOVA test even though the Kolmogorov and Shapiro tests are significant at .000? And the skewness and kurtosisness of the immediate post-test are significant?

Thank you!

Normality is also a requirement for ANCOVA.

Charles

Okay, thank you!

Charles says:

December 6, 2015 at 9:13 am

It turns out that Statistics is not just “science” but also “art”.

Charles

_________

Exactly on the target, Charles!

In fact Statistics in practice lives “UII”, under insufficient information. It´s the heel, and the pleasure, of our lives,

Art disguised as a Mathematical Branch let me say.

Inductive instead of deductive.sts of hypothesis via necessary conditions, not at all sufficient ones.

Hi Charles

I’ve got some A&E data that need analyzing and would appreciate some advice. The data I have are categorical; address area, age group, trust attended, referral mode, discharge destination. I have done chi-analysis on the data but I want to do analysis to predict the discharge destination of an individual from a certain area, of a certain age group, attended a certain trust by a certain mode. A probability tree would be huge, is there anything else that can be done?

Thanks

Sab

Sab,

Perhaps binary logistic regression or multinomial logistic regression.

Charles

Hi Charles, great to land on your website! I have a small data set (n<25). I run multiple regression for different combinations of 4 variables and single factor. With variable A, it shows multiple R<15% and Rsquare50%, Rsquare>30%, coefficient =0.004. I’m considering to choose A & C. what do you think? how would you interpret these? thanks so much!

Des,

I would need additional information to answer your question. See the regression portion of the Real Statistics website

http://www.real-statistics.com/multiple-regression/

Charles

Charles,

We have been using the JENKS formulas in our supply chain to help rank items.

We come up with a weighted ranking value for each of our SKUs. Then, we take the JENKS formula against this value to determine rank distribution. This worked perfectly last time returning 10 ranks all populated with data.

However, this year when we ran it, sometimes we are only getting 7 to 9 ranks. When we use JENKS, we are specifying the number of ranks we are wanting returned.

Do you have any thoughts as to why this would happen?

Thanks for any input.

Jonathan,

If you send me an Excel file with your data and the results of the JENKS function, I will try to figure out what is happening.

Charles

Hi Charles

Very helpful website but I have a query which I hope you can help me with. I am investigating the effects of loneliness on maladaptive behaviours and after computing a Pearson’s correlation, a non-significant relationship was found. I wanted to analyse my data further and see if there were any gender differences or differences regarding age. Is is appropriate to just compare the Pearson’s scores I have ran for the groups even though the sample sizes are different?(I have 34 males and 12 females). Or do I need to use a Fisher’s conversion to appropriately compare them.

Thank you in advance

Molly

Molly,

I don’t see how you can calculate a Pearson’s correlation when the sample sizes are different. Perhaps you need to use a t test in this case. These subjects are covered on the Real Statistics website.

Charles

Dear Charles,

I am very confused with the ADF and cointegration issue.

Am I corrected that if I have some time series variables, I have to first make sure they are stationary by applying ADF on the variables. Once they are stationary, I then can build a model using those variables as dependent and independent variables. Then, I have to perform the cointegration test on the model by applying ADF on the residuals.

Is my understanding right?

Many thanks and best regards,

Kelly

Kelly,

Conducting an ADF test doesn’t make a time series stationary, it merely tests whether the time series is stationary. If the time series is not stationary, you need to do something else to create a related stationary time series — e.g. differencing. In this case you can now build a model for the stationary time series which you can use, for example, to make predictions. The good news is that form these prediction you can also make predictions about the original non-stationary time series.

Charles

Dear Charles,

Thank you for your response. If I have 4 original variables in which 2 are stationary and 2 are nonstationary, I then need to transform them into 4 stationary variables before building a time series model from these stationary variables that are transformed from the original variables.

Since not all of the orginal variables are nonstationary at the beginning, do I still have to perform a cointegration test?

Best regards,

Kelly

Kelly, Perhaps. See the following from Wikipedia:

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).[3] The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.

For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high R-squared relationship (suggesting high explanatory power on Fiji’s consumption from Afghanistan’s GNP). This is called spurious regression. To be more mathematically precise, two integrated I(1) series which are statistically independent may nonetheless show a significant correlation; this phenomenon is called spurious correlation.

Charles

Dear Charles,

Thank you very much for your explanation.

Best regards,

Kelly

Good evening,

at first I should write, that is great, that you are doing all these things, soft, web, discussion… You have my admiration fo this. But I am also asking for a help. I have two parametres on measured tests. One is position (-5,-4,…0,+1,+2,…,+5). For some positions I have results of test (we could say second parameter), for other position I would like to find result (presumption, estimation, premise) without test, only with knowledge of neibourghs, knowledge of others. For example -5~5.1, -4~4.2, -3~???, -2~5.5, -1~4.5, 0~4.2, +1~5.5, +2~5.5, +3~5.2, +4~3.9, +5~4.5. Is possible to compute position”-3″? Please, how to do it? Thank you very much for your help.

Miroslav

Miroslav,

If I am understanding the situation properly, you are looking to estimate missing data. There are many techniques for doing this. See the following webpage for details: http://www.real-statistics.com/handling-missing-data/

Charles

Good evening,

thank you very much for your answer, but I did mistake, it looks. Data, which are used are measured from tests on complicated sample (values corresponding to deformations on the positions -5,-4,…0,1,…5 on the car). For some positions I have results of test, for other position I would like to find result (presumption, estimation, premise) without test, only with knowledge of neibourghs, knowledge of others. So, if it is like this, it is not Normal distribution, is it? In that case, maybe better way is just “trend” in excel. But I am not sure, if it is enough 6th degree of polynomial function for this tests. Thank you very much,

best regards

Miroslav

Miroslav,

If you believe that the data follows a particular 6th degree polynomial y = p(x), then simply substitute the x value x0 that you want and calculate p(x0) — i.e. the trend, as you have suggested. This is essentially how one of the simplest missing values approaches works.

Charles

Good afternoon Charles,

I am sorry for late reply. I have to thank you very much for your answers. This is reason, why I am writing: thak you.

Have a nice day 🙂

best regards

Miroslav

Respected Sir

I have downloaded real stat for Excel 2010. after installation it asks for password. Please help me how I may get password.

Mukta,

See http://www.real-statistics.com/appendix/faqs/password-prompt/

Charles

Hello Sir,

Although I’m using SPSS for my statistics, I still find your website very useful.

Regarding “linear regression”, are the results still true if my variables are not normally distributed according to K-S and Shapiro-Wilk’s test? In other words, does linear regression require the assumption of normality even though the variables are linearly correlated?

After running the test, all other assumptions (except for heteroscedasticity) were met including independence of observations, normal probability plot of residuals and the model was a good fit with an R-squared value of 0.7346.

I have run “Robust Standard Errors” to overcome heteroscedasticity. Does this make sense? I mean would it correct for both normality and heteroscedasticity? If so how should I write/ report the results of a robust standard errors test?

Thank you

Heba,

Normality is not a strict requirement for linear regression. It really depends on how you plan to use the model. See the following webpages:

http://people.duke.edu/~rnau/testing.htm

https://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression

If you have heteroscedasticity, then you can use robust standard errors.

Charles

Thank you!

In your opinion, which would you recommend; bootstrapping, weighted least squares or robust standard errors to overcome heteroscedasticity?

Heba,

The easiest is probably robust standard errors, and so I would use that approach.

Charles

Dear Charles,

In ADFTEST(A3:A22,TRUE,-1), the -1 means using the Schwert estimate for the maximum number of lags. In ADFTEST(A3:A22,TRUE,0), what is the meaning of 0 here?

Best regards,

Kelly

Kelly,

Lag = 0. See the following webpage for a more complete explanation:

http://www.real-statistics.com/time-series-analysis/autoregressive-processes/augmented-dickey-fuller-test/

Charles

Dear Charles,

Yes, thank you very much for the link.

Best regards,

Kelly

Dear Charles,

From where can I find the information about Granger-Causality Test, Impulse Responses, and Forecast Error Decompositions at all?

Best regards,

Kelly

Kelly,

These are topics that I haven’t yet covered on the Real Statistics website. I would imagine that most good books about time series analysis or econometrics cover them – e.g. Gujarati and Porter’s Basic Econometrics.

Charles

Dear Charles,

Thank you very much for the books.

Best regards,

Kelly

How to associate data from two different surveys coming from two different populations?

I want to associate between staff perception of hospital quality with patient satisfaction about hospital services quality. Two different surveys were used for the 2 different population. The only common thing is that they were collected from the same hospitals. Other meaning: How to find the relation (correlation or regression) between the independent variable (Six Sigma methodology: from the first survey) and between the dependent variable (patient satisfaction: from the second survey)? I have two study populations: the first is the in-patient client at the Hospital and the second population is health providers at the same

the first survey was to collect the data from workers (health providers: 328 persons) aiming to measure the level of using Six Sigma methodology at the hospitals. The survey consisted of (5) domains (DMAIC) that is (define, measure, analyze, improve, control) and total of (68) statements by Likert scale (5 degrees: strongly agree, agree, neutral, disagree, strongly disagree).

the second survey was to collect the data from in-patients client (540 patients) aiming to measure the level of patient satisfaction at the hospitals. The survey consisted of (5) domains (DMAIC) that is (define, measure, analyze, improve, control) and total of (36) statements by Likert scale (5 degrees: strongly agree, agree, neutral, disagree, strongly disagree).

Actually, I collected the data from the two populations, but I don’t know how to correlate between them as I mentioned above.

Amer,

Since the populations are different, I can’t think of a way to calculate a correlation.

If they are measuring the same thing, then you can test whether the measurements differ.

Charles

Dear Charles,

Thanks for your effort, I’ll like to ask, no way to calculate the correlation, even thought the data collected form the patients and health worker form the same hospital and during the same period of time and the basic 5 domains are similar in both surveys.

Amer,

You might be able to calculate the correlation in this case. With multiple variables, you essentially run linear regression and look at the Multiple Correlation value output from the Regression data analysis tool.

Charles

Dear Charles,

Thanks again, I’ld like if you could please tell me the criteria which l’ll use is correct.

Sine I have 2 populations, one is the workers (328 persons) and other is the patients (540 patients), every population is in a separated date file, so I’ll do the following:

First: combining the two data files into one data file.

Second: As I’ve two kinds of observations, one form worker at the hospital and the other form the in-patient from the same hospital, also every observation has the same 5 domains, and every domain has several questions (NB: The workers’ questions differ from the patients’ questions), so to overcome this difference I’ll calculate the mean for each of the 5 domains for every observation and as well as the total score.

Third: Run the regression to find the correlation between the data (means) of workers and that’s for patients, but I don’t know which type of regression must I use because the 2 samples’ (workers and patients) are not equal in size (the workers are 328, and the patients are 540).

So please Dr. Charles, tell me if the criteria that I mentioned above is right or not and your advice to me.

My best regards,

Amer

Amer,

It doesn’t seem like you can use regression or correlation in this case.

Without using any statistical terms, what real-world problem are you trying to solve?

Charles

Hi Charles,

I have a question regarding the Holt- Winters method for forecasting using the realstats excel add-in. What do you exactly mean by “number of seasons” in the dialog box? My data is monthly (16 months), if I insert “4” as the number of seasons does that mean that it will regard every (16/4= 4) months as a season? Or does this method not work when data is monthly? Should I have my data prepared in a certain way before forecasting?

Thank you in advance.

Haneen,

If your data is monthly, then enter 12.

Charles

Hi,

Thank you for your reply. Is there an alternative to entering 12? When I enter 12, the MAE and MSE increase greatly (compared to other forecasting methods).

Haneen,

I don’t know what webpage you are referring to.

Charles

This one

http://www.real-statistics.com/time-series-analysis/basic-time-series-forecasting/holt-winters-method/

Haneen,

You can use any value you want, but if your seasonality is monthly, then the correct value is 12.

Charle

Dear Charles,

I’m sorry, if my question was not clear.

I would like to ask if I can pool the data set from two different surveys, taking into account that the first survey population is 328 staff members of the hospital as (independent variable: six-sigma methodology). The second surveys population is the inpatients in the same hospital, numbering 540 patients as (dependent variable: patient satisfaction). Both surveys consist of five dimensions (define, measure, analyze, improve, control). While the survey of workers with 68 questions and the survey of patients with 36 questions. Both surveys were designed according to Likert scale.

My question is: Can I collect the two data-files from both surveys into one dat- file and calculate the relationship (correlation) or regression between the independent variable (six-sigma methodology) and dependent variable (patient satisfaction)?

Amer,

You question has been very clear, and my answer is the same. You can certainly collect the the data from the two data sets into one file, but I don’t see how you can calculate a correlation. If each staff member was responsible for one of the patients, then you would have the one-to-one relationship necessary to calculate a correlation, but this is not your situation.

You can compare the survey results from the two groups (e.g. by using multiple t tests or, better yet, using Hotelling’s T-square test), but this is different from calculating a correlation.

Charles

Dear Charles,

Thanks you very much for your effort.

Best regards,

Amer

Hello Charles, My excel crashes when i performed a kruskal Wallis test. Now Every other test i did displays a #N/A. I tried to restore it but i have the same problem. Have you a solution or must i do the tests again ?

Mathieu,

I can’t think of any reason why Excel should crash in this case, nor why you should get #N/A.

If you send me an Excel file with your data and the Kruskal-Wallis test results (if available), then I will try to figure out what is going on. You can send the file to my email address as shown at Contact Us.

Charles

Dear Charles,

I have concentration data for about 50 metabolites from plants under 3 invariables: temperature (5 and 15 degrees), time (1, 7 and 10 days) and light (white and far red). What would be the best statistics to analyse these data to know that which of the invariables or their interaction influence the metabolite concentration.

Many thanks for your help in advance.

Best regards

Ferenc Marincs

Ferenc,

It depends on the details, but some version of ANOVA seems likely. Given that you have a Time factor, it is likely that you need repeated measures ANOVA (or MANOVA). See the following webpage Repeated Measures ANOVA.

Charles

Charles,

When I attempt to use the Shapley function with an array of data (as shown on the Shapley-Owen example page), it provides the decomposition value for the first column in the area but doesn’t present a value for other columns (as described below):

Real Statistics Functions: The Real Statistics Resource Pack contains the following array function. Here R1 is an n × k array containing the X sample data and R2 is an n × 1 array containing the Y sample data.

SHAPLEY(R1, R2): outputs an k × 1 column range containing the R_1^2, R_2^2, …, R_3^k values

For Example 1, the output from the formula SHAPLEY(A4:C8,D4:D8) is shown in range G13:G15 of Figure 1.

Any idea why the respective values for the remaining columns in the array are not showing up in the workbook (like your example)?

Thanks,

Dave,

There are three x values in Example 1, x1, x2, x3, one for each column. The output contains three rows. Each row corresponds to one of the three columns.

If you had four X variables and each variable had 100 rows of data, the output would contain four rows, one for each of the four variables.

Charles

Hi M. Charles Zaiontz,

Two or three words : Thanks a lof !

Your web site and all the related stuff are just amazing. Merci pour cette générosité.

Hi Charles,

Great initiative and the website explains the details in a flow that it is easily understandable.

I am currently trying to study Holt-Winters and apply it to one of my data sets,but i am facing a challenge. My data consists of data points with time stamp interval of 1 minute and complete data set is for 4 months. What should be my value of c? Any suggestions?

Do you have any seasonality? If not, use Holt’s Trend instead of Holt-Winters.

Charles

Hi Charles,

Thanks for your time and effort.

My data is a level reading from a water tank and the data is for every minute. Considering there is seasonality what should i take the value of c ? would 1440 work?

If the seasonality is daily, then 60 x 24 = 1440 seems appropriate.

Charles

Hy sir i appreciate your concern and i have one question

1. Iam conducting my thesis on Comparative study of Analgesic effectiveness of thoracic paravertebral block (TPVB ) vs. intercostal nerve block vs patient without these two for cholecystectomy under General anesthesia.

since i have three group my statistic tool is ANOVA

my question shall i use one way anova or two way anova ?

thanks sir

Sorry, but I don’t understand the scenario that you are describing.

Charles

Dear Charles,

Thank you for this very useful resource.

Do you have an online-course of statistics from A to Z?

If no, did you ever thought about making it?

I guess lots of people would like to study statistics based on Excel with you.

Greetings from Russia, Moscow.

All the best,

Vladimir

Thanks Vladimir for your suggestion. I have been thinking about creating a series of youtube sessions.

Charles

Dear Charles,

Thank you very much for this wonderful site. I am new to forecasting and working on machine learning problems now.

I’ve a use case to predict number of bookings for a given week.

Features are – Year, WeekNumber, CountryCode, AgentCode, No.of.bookings.

I understand that ARMA, ARIMA models looks only for one variable (here No.of bookings) and try to forecast the same. But in the given case, other attributes impact no. of bookings.

Which are the approaches I can take to model the forecast ?

You are talking about panel data, which is an extension to time series data. The Real Statistics software currently doesn’t support this type of data. Eventually it will support this type data.

Charles

Hello,

Would you kindly suggest an analytical tool for my agric experiment.

I have four replicates of each sample and data for plant height and number of leaves, spanning 7 weeks i.e data is recorded weekly.

i also have data for days to flowering, shoot weight, root weight and fruit yield (all for each sample)

the experiment is to test for the effect of two pathogens on the sample when treated with chemicals. a positive control, neutral control and negative control were also used

Riife,

It is difficult for me to give a precise answer without more detail about the experiment and more importantly about the type of hypotheses you want to test.

It certainly sounds like some sort of repeated measures Manova type of analysis (Manova since there are multiple dependent variables. Repeated measures if you want to compare weeks).

Charles

Hi Charles,

I just wanted to drop a line to say thank you for your work! I used some of your code as well as learnt from your explanations there too.

Also I read that you may want to do some videos. If I may suggest, please make them from a problem solving perspective. There are already many videos on theory explanation but not so many where the theory is extracted while tackling and solving different problems.

Regards,

Eduardo

Eduardo,

Thank you very much for your support and your useful suggestion.

Charles

Many thanks dear Prof,

It is a great help for students,,

regards,

shaheen

Hi Charles,

I have downloaded the software and added in in my excel app. However, it disappears after I close my excel app, and I need to do add-in again every time I open my excel. Can you suggest what is the real problems here?

Thanks,

Hi Lengsea,

See http://www.real-statistics.com/appendix/faqs/disappearing-addins-ribbon/

Charles

Hello-

In Excel 2010, the Hotelling T2 output for the Hotel 2 example data does not match the results illustrated in the example.

The output from Real Stats is:

Hotelling T-square Test

Two-samples (equal covariance matrices)

T2 3.149934857

df1 3

df2 32

F 0.988214857

p-value 0.410715235

However, the results from the example Hotel 2 are:

Hotelling T-square Test

Two-samples (equal covariance matrices)

T2 4.116057236

df1 3

df2 34

F 1.295795796

p-value 0.291686665

Please advise. Thank you.

Sandy

I see that I had not un-checked the “data labels” option when specifying the data ranges.

My bad.

Apologies for the disruption.

Cntrl+M is not bringing up Real-Statistics today and I need to run some time series analysis. Was working fine yesterday. Please advise.

Kathleen,

If it was working yesterday, it should work today, unless you made some change. I suggest the you close Excel and then reopen it. If that doesn’t work, restart your computer.

Charles

Thanks Charles. I viewed some earlier posts and it had something to do with the unblocking option. I tried that and it fixed my issue.

Kathleen,

Thanks good to hear.

Charles

Hi Charles:

Thank you for your blog, Iam learning a lot.

Right now I am evaluating a method for clinical skill learning (I’m Professor in Dentistry). The method is perfoemred by 25 students, and there are 8 different clinical procedures. And according to the performance of the student in each procedure, the student can be classified as sactisfactory (1) or unsatisfactory (0).

My doubt is the following:

In my case, do you thing that Kruder-Richardson is the best choice? Or may I use Fleiss Kappa interrater agreement?

The following is part of the table, the first column is the number of studentes (1 to 25), the following 8 columns are eight clinical procedures, and 1 mens satisfactory and 0 menas unsatisfactory. Hope that you can help me.

St DV 11 DL 21 MV 16 DV 15 DL 46 DL 41 ML 43 DV 32

1 1 1 1 1 1 1 1 1

2 1 1 1 1 1 1 1 1

3 1 1 1 1 1 1 1 0

4 1 1 1 1 1 1 1 1

5 1 1 1 1 1 0 1 1

…..

…..

…..

25 1 1 1 1 1 1 1 1

Thank you for your attention.

Best regards

Tien

Tien,

Glad to see that the Real Statistics website has been helpful to you.

Based on additional information you provided me in an email, this doesn’t appear to be a fit for either Kruder-Richardson nor Fleiss Kappa.

I have tried to give some ideas in my email response.

Charles

The download isn’t working. Anything you can help with?

Anat,

When you say it isn’t working, can you be more precise as to what sort of problem you are having?

Charles

Hi, could you please help me? I’m trying to use the COVP function for obtaining variance-covariance matrix, from two columns of data. Also, I’m selecting in Excel an empty 2×2 cels range an then pressing ctrl-shift-enter. I works great!! I work on Windows. The problem is that my girlfriend has downloaded the Mac version of Realstatistics, She is doing exactly the same that I do, but in her case she obtains only errors. Do you have an idea of what is happening? Thanks

Eduardo,

Do you know which version of RealStats she is using? You can find this out by entering the formula =VER(). Also what version of Excel for the Mac is she using?

Charles

Hi Charles,

Thanks for such an excellent blog. Could you please indicate where to find the spreadsheet listed as Appendix 4 in your 2015 Psicologica paper (A robust function to return the cumulative density of non-central F distributions in Microsoft

Office Excel)?

Many thanks.

I didn’t write a 2015 Psicologica paper, and so don’t know what spreadsheet you are referring to. In any case, you might find the information that you are looking for on the following webpage:

http://www.real-statistics.com/chi-square-and-f-distributions/noncentral-f-distribution/

Charles

Hi Charles, I’ve just realised that, my apologies! Thanks for the prompt reply and for the great resources.

Mike

Please need a spreasheet about a two stages multilevel model.

I have to know step-bystep multilevel modelling.

Thank you,

André.

André,

Sorry, but Real Statistics doesn’t support multilevel modelling as yet. This capability will be added some time in the future.

Charles

Hello Sir,

We are developing a method that requires method precision as a validation parameter. Normally we sample 6 to 10 times from a material and analyze each on to generate and average, SD, and RSD. The %RSD serves as the precision. However, this time the sample is a powder and we look to determine two contaminate powders in the sample. It is difficult to obtain a homogeneous sample at the desired 100 ppm target limit making method precision determination nearly impossible. I could of course spike the 2 components on top of the powder sample for each individual prep and find out the percent recovery, mean recovery, SD and %RSD. But could I then use this data to determine the method precision? In other words for to I relate spike and recovery data back to a target amount in ppm? relative difference to the theoretical for each determination then %RSD of that? Just trying to see if I can use recovery data to determine method precision when the samples are independent.

Sorry Keith, but I don’t understand these particular issues well enough to provide a response.

Charles

Hi Charles,

yesterday I found out your excellent RealStat. It impressed me so much: I think it may be very helpful. Great job!

As soon as installed, it worked perfectly.

However, the spreadsheets I saved yesterday today were no longer working. So I have had to reinstall the package and unblock the file RealStat.xlam to make it working. I also added the Addin directory among the Trusted Locations. (Windows 10 64 bit)

Curiously, now it is doing all the calculations but, unfortunately, its very useful “help” in the function wizard doesn’t still work. So it is boring and slows me down a lot to have to look in the website for the documentation of each function parameter.

I tried to reinstall the addin more and more times, but I failed.

Please, can you tell me why it happens?

Paolo

Paolo,

I am pleased that you like RealStats.

I don’t know why you had to reinstall the software. The help wizard should work in the Excel 2013/2016 version of the software. Yiu can check which version you are using by entering the formula =VER(). I hope to make this capability available in the Excel 2010 version in the next release.

Charles

Hi,

the formula =VER() reports “5.2 Excel 2013/2016”, Excel version is 2016. I Have had to reinstall RealStats because something was blocking it: Control+M did not longer work.

Yesterday, for shure, the help worked. Today, after reinstalling, no more. The Macros, on the contrary, work wonderfully.

Paolo

Thank you Charles. Your site provides a wealth of information, especially for me, a novice and simply not all that good at mathematics.

I have a question, but I have yet to find information about a solution. It’s related to your binomial and beta distribution tutorials. I can use the BINOMDIST to find the sample size needed when I provide the required accuracy, confidence level, and number of errors. But I’d like to go the other way and find the number of allowed errors when I know the required accuracy, required confidence level, and number of trials. All my tests are pass/fail. There are no retries. Is there a similar statistical function or formula I can use to do this? I’ve searched, but I’m not familiar enough with statistical functions to know if I’m just missing it or not.

Thanks in advance,

Andrew

Andrew,

Glad you are getting value from the website.

There may be a simpler way, but you should be able to use Goal Seek or Solver to solve these sorts of problems. Numerous examples are provided on the website for using these tools for similar type problems.

Charles

Thank you Charles. I’ll check out Goal Seek and Solver. I appreciate being pointed in a direction to look.

Hello Charles,

pls help me to chose design for following problem

I want to develop mixture/formulation. A combination of 2 polymers gives me this formulation. poly A- 6% & poly B-2%.

however, for poly A, there is supply issue,so, I am looking for 2nd vendor.

I tried poly A from 2nd vendor. But, to get same formulation, it requires 4% only (compared to 6% of first vendor).

Now, I am wondering if it is possible to have one formulation which will accomodate poly A of both vendors?

we can change level of poly B to get this new robust formulation.

pls help. Thanks in advance.

Mudassar

Mudassar,

Sorry, but I don’t understand the scenario that you are describing sufficiently well to be able to answer your question.

Charles

Iam getting #NUM! and # div/0! on my data when i use multiple variables for linear regression, which doesn’t happen when iam doing with solitary variables please help

Kavitha,

If you send me an Excel file with your data and the linear regression that you performed, I will try to figure out what has gone wrong.

Charles

Dr. Zaiontz:

A very simple question that I have been unable to answer. I have downloaded Realstats for binomial logistic regression since I have a dichotomous dependent variable (enrolled/not enrolled). The space to enter the input range for variables does not separate to show the separate inputs for independent and dependent variable as was shown on input examples I found on the web. Is the column of dependent variables in the same block as the independent variables and if so, should it be the first column or the last column? Thanks, Dr. Peters.

Dr. Peters,

It is not separated and is the last column. See

http://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/

Charles

Hi Charles,

First of all, I would like to thank you for this excellent website. It provides plenty of useful information, especially for people who find difficulties in dealing with statistics.

I have a question, and I hope to be able to explain it in an understandable way. I’ve performed an experiment without replicates, which consists in treating a cell culture with two different substances (control and treatment), and measure the effect on their growth at different times (1 hour, 7 hours, 12 hours and 24 hours). The effect that I have measured is the percentage of cells in a certain phase of their growth cycle. I would like to know if there is a significant difference in this percentage between control and treated cells.

My main problem is that I don’t have any replicates. So far, I have tried with a two-factor ANOVA test without replication (considering “treatment” and “time” as the two factors), but I am not sure this is the right method for this situation. Are there any other statistical tests that I can use, considering that I don’t have replicates?

Thanks in advance,

Marina

Marina,

I am very pleased that you find the website helpful.

If you want to test both the Treatment and the Time, then you should use Repeated Measures ANOVA with a fixed Treatment factor and a repeated measures Time factor. This specific situation is described on the following webpage:

Repeated Measures ANOVA with one within subject factor and one between subjects factor

You will need to read about Repeated Measures ANOVA in general before this webpage will make sense. See

Repeated Measures ANOVA

Note that if you only had the Time factor and not the Treatment factor you would use a simpler version of Repeated Measures ANOVA; this would be equivalent to two factor ANOVA without replication where one factor is Time and the other is Subjects.

Charles

Dear Charles,

Thank you very much for this excellent website.

I was wondering if you might consider allowing users to subscribe to a brief email notice when there is new content. If so, I’ll be the first to sign up!

Again, many thanks —

Neil McQuarrie

Neil,

I use twitter for this. Please follow at

@Real1Statistics

You can also use an RSS feed.

Charles

Now you have a new folower. Have a nice day

Hi Charles,

I am trying to download the software, but every time I click the link it says “Page Cannot be Found”

Could you help me out? I am trying to compare some slopes by this afternoon!

Thanks.

Emma,

Sorry about that. Earlier today I put out a new version of the software and apparently the wrong link was established. I believe this problem has now been corrected and you can now make the download.

Charles

Dear Charles,

How can I change some input value in your macro? I am trying to change value in {=DescStats(B3:B106,TRUE)} to {=DescStats(B3:B166,TRUE)}. Pressing “Enter” does not work here.

Best regards,

Kelly

Dear Charles,

Oh, eventually I realized that this is “Array formulas” and you had once explained the concept.

Mant thanks and Merry Christmas,

Best regards,

Kelly

i want to find out 6 heavy metals in bicuits taken from nine towns of lahore. from each town 6 samples of biscuits were collected. In each sample, i want to find out the concentrations of six heavy metals. should i use two way anova with replication or without replication?

similarly if instead of six metals i want to find out only 1 metal in biscuits taken from nine towns of a city. then will it be appropriate to use single factor anova?

The ridge regression function are really nice. Are there going to be similar functions for Lasso regression to figure out the Lambda? For example is there going to be a LassoLambda() function in the future version?

Hussain,

Thanks for your input

This is on the list of potential future enhancements

Charles

Dr. Zaiontz,

I would like to thank you for making all useful information accessible to the students.

Do you have any materials or example on K- fold cross validation method ?

I am trying to write a VBA code to perform multiple regression using K fold cross validation method.

Thank you for your help!

Fady,

See the Cross Validation part of the following webpage:

Estimating Lambda for Ridge Regression

Charles

Dear mr. Zaiontz, thank you for your very useful application!

This not only helped in the work, but also helped to understand some methods as well as the sequence of computations.

One useful tool can really beautify your app. I really want to have the ability to calculate incidence rate ratio CI and p-value with different ways like it is implemented in http://web1.sph.emory.edu/cdckms/IRR-single%20table.htm

or

http://www.openepi.com/PersonTime2/PersonTime2.htm

I use these online pages, however, is a very long time, if computing a lot. It would be great to be able to do this in Excel.

P.S. I couldn’t donate because there is no Russia in the country list in PayPal

Alex,

I am happy that the website has helped you and thanks for sending me the links.

I don’t completely understand the tables in the links. What are they showing?

It seems like the links you sent me are related to the topic described on the following webpage. Is this correct?

http://www.real-statistics.com/descriptive-statistics/roc-curve-classification-table/

Charles

Not quite. I don’t know exactly what type of type of analysis it is.

On these pages you can calculate incidence rate and 95% CI – person-time units? like the number of infections per 100 catheter-days.

In addition, it is possible to calculate the ratio of the two incidence rate, 95%CI and p-value with different ways.

I can’t find some methods of calculation, especially – сonditional maximum likelihood estimate of Rate Ratio and it’s 95%CI.

I found some older articles describing methods of calculation. I can send them by e-mail. I was not able to deal with them.

Alex.

I think epidemiologists often use these methods.

Alex,

I will eventually look at these webpages to try to understand what they are doing.

Charles

Dear Dr. Zaiontz,

Thank you very much for your detailed and easy-to-follow website. It was a lifesaver as I was writing my master’s thesis and I have it bookmarked for future applications!

Kristie,

Good to hear. Glad I could help.

Charles