**What is Real Statistics Using Excel?**

*Real Statistics Using Excel* is a practical guide for how to do statistical analysis in Excel plus free statistics software which extends Excel’s built-in statistical capabilities so that you can more easily perform a wide variety of statistical analyses in Excel.

**What does Real Statistics Using Excel consist of?**

*Real Statistics Using Excel* is comprised of the following three components:

**Real Statistics Resource Pack: **an Excel add-in which extends Excel’s standard statistics capabilities by providing you with advanced worksheet functions and data analysis tools so that you can more easily perform a wide variety of practical statistical analyses.

**Real Statistics Website **(i.e. this site):

- Lets you download a free copy of the Real Statistics Resource Pack
- Provides descriptions of how to perform a variety of statistical analyses using built-in Excel capabilities as well as supplemental capabilities provided by the Real Statistics Resource Pack
- Presents numerous examples in the form of Excel worksheets which you can download to your computer

For the student and the novice, the Real Statistics website is an excellent tutorial for learning the basic concepts of statistics and how to do statistical analysis. For all users, it provides a step-by-step guide for how to do statistical analysis in the Excel environment and the tools necessary to carry out these analyses.

**Real Statistics Examples Workbooks**: four Excel files which contain all the examples contained in the website. These files can be downloaded for free.

You can use this website to learn how to perform statistical analyses in Excel even without using the Real Statistics Resource Pack, but we recommend that you download the resource pack so that you can have access to its powerful capabilities.

**How do I get started?**

**Step 1**: If you elect to use the Real Statistics Resource Pack or would like a copy of the examples used throughout the website, click on the following icon and you will be given the opportunity to download and install for free the Real Statistics Resource Pack and/or the Examples Workbook.

Once you have downloaded and installed the Real Statistics Resource Pack, you will be able use the supplemental capabilities from the copy of Excel that you run on your computer as described throughout the rest of the website and summarized in Real Statistics Functions, Real Statistics Multivariate Functions, Time Series Analysis Functions, Missing Data Functions, and Real Statistics Data Analysis Tools.

If you choose not to download the resource pack or examples now, you can do so later at any time.

**Step 2**: Browse through the website to to learn how to perform a wide range of statistical analyses in Excel using standard built-in as well as supplemental capabilities. We suggest that you begin by clicking on the Website Introduction (and especially Organization of the Website) for further information about how to navigate the website to get the information you need to run any specific statistical test or learn about any particular topic.

**Why do statistical analysis in Excel?**

The reasons for choosing Excel are as follows:

- It is widely available and so many more people know how to use it
- It is not necessary to incur the cost of yet another tool
- It is not necessary to learn new methods of manipulating data and drawing graphs
- It already contains some basic statistics functions and data analysis tools
- It is much easier to see what is going on since unlike the more popular statistical analysis tools very little is hidden from the user
- It provides the user with a lot of control and flexibility

This makes Excel an ideal tool for learning statistical concepts and performing some basic statistical analyses, but unfortunately its built-in statistics capabilities are limited, and so it is often easier to use statistical tools such as SPSS or SAS for carrying out more advanced statistical analyses.

It is to address Excel’s shortcomings that we have created the Real Statistics Resource Pack. This software package contains various supplemental tools that enable you to carry out a wide range of advanced statistical analyses without leaving the Excel environment. You can download the Real Statistics Resource Pack free of charge from this website (as described above).

Dr. muchas gracias, por implementar diseño 2^k y correspondencias en análisis multivariado, sin embargo no puedo desplegar los menús, con el paquete de Real Statistics, por favor podría explicarme.

Dr. Thank you very much, for implementing 2 ^ k design and correspondences in multivariate analysis, however I can not deploy the menus, with the Real Statistics package, please explain.

Gerardo,

The 2^k design will be included in the next release of the Real Statistics software. This will be available before the end of August.

Charles

Ok, Dr, thank you very much, and the Correspondence Analysis too?

Yes, the next release will contain this capability and a number of others.

Charles

Thanks for this! I am including box plots generated by this program in a publication. Can you tell me how to correctly cite the software?

Kathy,

See Citation

Charles

Hi Charles,

I just wanted to take the time to say that I have found your website very resourceful and helpful in my work. Thank you for publishing the content. I see you are often solving other people’s problems in the comment section as well, which shows dedication to your field. You do good work, sir.

Best,

Joshua C.

Thank you very much, Joshua. I appreciate your kind words.

Charles

Hallo Mr. Zaiontz,

I was wondering, is there an already built function that deals, at least patially with Augmented Dickey-Fuller Table critical values?

Kind regards

Ivan,

Yes, it is the Real Statistics function ADFCRIT. See the following webpage for details>

http://www.real-statistics.com/time-series-analysis/stochastic-processes/dickey-fuller-test/

Charles

Thank you for the response. When analyzing the example, for instance DF0, value for tau-crit = -1,95522, while the equivalent value in the Dickey-Fuller table, should be -1,95. How do you come up with the small difference of 0,00522 between the example value and the table value?

Ivan,

Where are you getting the -1.95552 value from?

Charles

Real-Statistics-Time-Series-Examples

Dickey-Fuller Test

Example 1 (no constant, no trend)

cell O8 – tau-crit = -1,95522

Ivan,

I believe that the table is simply a little less accurate than the ADFCRIT formula.

Charles

I would like to create a realistic income distribution. The range of income would be 0 to $300,000 with a mean of $40, and I imagine an SD of $10,000. What is the best way to accomplish this?

Gregory,

It really depends on what you mean by a “realistic” income distribution. A mean of $40 and standard deviation of $10,000 doesn’t seem realistic.

Charles

Dr. buenos días, como puedo trabajar experimentos factoriales 2^k y fracciones con Real statistics?

Dr. Good morning, how can I work 2 ^ k factorial experiments and fractions with Real statistics?

Gerardo,

Sorry, but Real Statistics doesn’t support 2^k factorial designs yet.

I just didn’t have time to do this when I added various Design of Experiments capabilities a few months ago. I will get back to this shortly.

Charles

Doc. Thank you very much

Gerardo,

I plan to add this capability in the next software release.

Charles

Hello Charles,

I am conducting a correlational analysis using a likert scale. I am having difficulty finding out if the variables are linear or not. How can I create a scatterplot to show if a categorical variable is linear? This is a very important step in my project. If I can’t determine linearity or nonlinearity my entire dissertation will produce incorrect data. Almost all the independent variables are categorical. Please help.

Thanks

Nocolette,

The linearity assumption is probably not violated when using dummy variables to code categorical variables. In any case, the following webpage may be helpful to you:

https://stats.stackexchange.com/questions/124580/how-to-prove-linearity-assumption-in-regression-analysis-for-a-continuous-depend

Charles

Thanks Charles,

I really appreciate you taking the time to respond to my question. I have a few more.

After dummy coding the variables, is it then ok to do Pearsons Correlation? If yes should I appply the same assumptions test as I would prior to conducting a Pearsons correlation analysis? If say for example there are outliers in one of the categories, what test should I run? Looking forward to your response.

Nicolette

Nicolette,

I don’t see any reason why you couldn’t use Pearson’s Correlation (although you probably need to use multiple correlation). Whether this is the appropriate statistic depends on what you are trying to accomplish. Outliers could be a problem whether or not you have categorical variables. One approach for dealing with outliers is to use Spearman’s correlation or Kendall’s tau.

Charles

Hello Charles,

I appreciate your response. I have a few more questions. Is it ok to dummy code categorical variables that has more than one categories? For example level of education (bachelors degree, masters degree, Doctorate)?

What if I dummy code the variables , and after which I conduct the assumptions test and the variables are not homogenous and also they fail the normality test. Is it ok to conduct the Spearman rho test?

What if the dummy variables pass the normality test but failed the homogeneity of variance test? What test should be conducted then?

I want to test the relationship between teachers attitudes towards inclusive practice (topic).

Sorry for asking so many questions. I want to make sure the results section in my project is correct. I am somewhat confused at this stage. So I really appreciate your help.

Nicolette,

Depending what sort of analysis you plan to conduct, you can can certainly dummy code categorical variables. This is commonly done with demographic variables of the type you describe.

I don’t quite understand the rest of your questions. What you do really depends on what sort of tests you plan to conduct. Based on your previous comments, it seems that you want to calculate the correlation between two variables. Is this still your situation? You don’t need to worry about normality or homogeneity of variances to calculate the correlation. If you want to test whether the correlation is significantly different, then normality can be relevant. If you need to use dummy variables, then the problem problem becomes a regression problem. I am speculating here since I don’t really know your situation.

Charles

Hello Charles,

Thank you so much for responding to my queries. You have been a big help to me. Thank you for your time.

Nicolette

Nicolette,

Glad I could help you.

Charles

Hi sir..

I have a question about correlation. In my research I am not considering about the ages of my sample. I have 180 respondents and their ages are between 15-53 years old. Can I just do the Pearson correlation test without specific the age? This is because one of the assumption of correlation is homocedasticity.

You don’t have to take age into account when calculating the correlation, but whether you should or shouldn’t do this depends on what you intend to use the correlation for.

Charles

Dear Charles!

Thanks very much for your Real Statistics: I am already using many tools in my job (agricultural research).

Do you plan to add other designs to Design of Experiments (graeco-latin square, criss-cross, incomplet block, lattice, etc) ?

Thanks for all

Mohamed

Mohamed,

Eventually I will add additional designs. I am am always expanding the number of tests that are supported.

Charles

Thank you again

Dear Sir,

Yours is one of the most useful websites that I’ve encountered in a long time and, with its excellent add-in and guide has allowed many of my collegues and myself to obtain a considerable degree of autonomy in performing a number of tasks involving statistical methods in the field of Health Sciences Education. Thank you very much from Chile!

Hi Charles,

I just wanted to chime in and say thank you as well. Your website and tools have been a great help to me in analyzing fisheries data. I have been meaning to learn R for years now but the steep learning curve and a general lack of time has kept me from embarking on that goal. Fairly comprehensive and easy tools like yours make it even harder to motivate myself to go down the R road because much of what I do can be done with your package. And I echo many others here that your clear, concise explanations for the how and why of many tools are excellent and very valuable. Kudos for the great work, and thank you most of all for making it freely available!

cheers, Mike

Thank you very much Mike. I appreciate your message and will continue to try to make clear explanations about additional statistics concepts.

Charles

I just stumbled across this awesome website, and I have to acknowledge with gratitude your hard work and skill. Your discussions are marvels of clarity, as are the spreadsheets. I’ve taught introductory applied statistics for over 30 years, and have long advocated for using Excel for exactly the reasons you mention. I even developed a small set of tools to share with my beginning students–but nothing as systematic and thorough as what is here. With your permission, I’ll be sharing your website in the future with all of my students and colleagues.

William,

Thank you very much for your kind remarks. I have tried my best to create explanations that are easy to follow and tools that are easy to use and easy to understand.

Please feel free to tell you students about the website so that they too can learn from the site and download the software from the site.

Charles

Hello,

I need to help in fixing a problem i have in my data. I am running a gravity model – regression for trade analysis between France and selected partner. I have a data panel in excel with dummies and transformed values.

The issue is if I run the regression the out have #num! in the probability column and the somethings the significant value also has this #num!. how to solve this in my this issue. I am very new to econometrics

Emmanuel,

It is very likely that the FDIST (or F.DIST.RT) formula that is calculating the p-value has an overflow problem. If you send me an Excel file with your data and analysis, I will try to figure out what is happening. You can find my email address at Contact Us.

Charles

Hello! When I start Reliability Testing to know the Cronbach’s alpha I mark the following discrimination cutoff must be a number between 0 and 5. What can I do?

Hermes,

I don’t understand your question. The following webpages may be helpful>

Cronbach’s Alpha

Item Analysis

Charles

Charles,

How do I post a new message on a new thread for this forum? I don’t find instructions.

Thanks.

Joel

joelmweinstein@me.com

Joel,

Just make your comment. This forum is quite informal and so there aren’t rigorous rules. I do suggest that you make your comment on a webpage that is related to the topic of your comment.

Charles

Thank you for providing a much needed summary in the field of statiscs in excell. It’s an excellent refresher and reference, having had statistics courses a long time ago…

I am facing a problem and would like to know how would you proceed to solve it.

I have monthly sales data for over 4 years for over 1000 products ( yes, a big table) and would like to forecast sales for 2017.

Problem. Some products show seasonality and growth. Others are less stable.

I cannot go over each product and test them for each forecasting it would never end.

Can you help?

Youssef,

If you want to forecast all the products, you need to forecast all the products. I don’t know a simpler way to do it, at least based on the information you have provided.

Charles

Hi Charles,

Not sure how to begin this new thread on your website so I posted it as a reply to the most recent question. I just downloaded the Mac version of Real Statistics and I am anxious to get started. My experimental design is a 2-way repeated measures ANOVA, similar to your example #?. It is an entirely within subjects variability. In brief, peripheral vision is measured (dependent variable = area) in 12 subjects, varying two factors, Optical correction (three levels) and Target size (three levels). Example #? goes into detail on the derivation of formulae for this type of ANOVA, but I did not find instructions on the website demonstrating how the data sheet should be arranged, or explicit instructions on how to perform the desired analysis using the drop-down windows.

Thanks very much for your help.

Joel

Joel,

I don’t know which example you are referring to, but in any case when you have one within and one between subjects factor, then this test is supported in the Real Statistics Repeated Measures Anova: mixed data analysis tool. If you have two within subjects factors, then you can look at the Examples workbook (which you can downlaod for free) for the spreadsheet that implements this test, but it has not yet been added to the Real Statistics software.

Charles

Doctor, buenos días que grata y excelente sorpresa, ver que el capitulo de análisis discriminante está en construcción.

Doctor, muchísimas gracias nuevamente, sus aportes son una bendición para la comunidad científica e investigadora.

Doctor, good morning it is a pleasant and excellent surprise, to see that the chapter of discriminant analysis is under construction.

Doctor, many thanks again, your contributions are a blessing to the scientific community and researcher.

Gerardo,

Good to see that this is useful to you.

Charles

Dr Thank you very much

How do I get a number for significance level, when I correlate data pairs i Excel?

Preben,

See http://www.real-statistics.com/correlation/two-sample-hypothesis-testing-correlation/

Charles

impressive, excellent, outstanding, …

Just a random bunch of words of how I would describe your website.

But even those words do not describe it good enough.

Respect!

Hi,

I was wondering if you knew of any way (maybe using macros or something) to find a max value at regular intervals in a single column in Excel.

Thanks.

Tirupan,

Please clarify what you mean by “max value at regular intervals”.

Charles

I need to find the max in the one column for the starting 300 rows for 11 times and then the max in next 220 rows, and then continue this pattern (11 times find max in 300 rows and one time in 220 rows)? Will a ‘IF’ or ‘WHILE’ should be used?

Sorry, but I still don’t know what you mean by “the max in the one column for … 300 rows for 11 times.” If the starting 300 rows are in range A1:A300, do you mean the max of A1:A11 and then the max of A12:A22, etc.?

Charles

I mean find max for A1:A300, A301:A600……until A3001:A3300 and so on for 11 times and then A3301:A3520, and then keep repeating this sequence for the rest of the data.

Tirpan,

Here is one way to do this, but it requires a little work. Since the second part is similar, I will only tackle the first part, namely find the max for A2:A301, A302:A601, …, A3002:A3301. Note that I started with cell A2 instead of cell A1.

1. Place the number -1 in cell B1. Insert the formula =B1+1 in cell B2.

2. Place the number 0 in cell C1. Insert the formula =IF(MOD(B2,300)>0,C1,C1+1) in cell C2.

3. Insert the formula =IF(C2=C1,””,MAX(IF(C2:C$3301=C2,A2:A$3301,””))) in cell D2.

4. Highlight the range B2:D3301 and press Ctrl-D.

Column D will contain the values that you are looking for.

Note that you really don’t need column B and instead can use values of ROW(), but this is a bit more difficult to understand. If you are using Excel 2016, you can simplify the formula in cell D2 by using the MAXIFS function.

Charles

Hi Charles!!! Congrats on your software!

I’ve tried to use the Multinomial Logistic Regression but there sonthing wrong. I’m using Excel 2016. I’ve installed the add-in and downloaded the sample file. I’ve tried to open but the file apear with errors on calculating. I’ve created another file with just 4 columns and the errors persists. I’ve re-installed the software and the error Was not fixed. Can you help-me? Can you make avaiable a file that is working? Thanks a lot!

Best regards!

Marcos

Marcos,

I just tried using the Multinomial Logistic Regression data analysis tool with the data in Figures 1 and 3 of the following webpage and everything worked fine

http://www.real-statistics.com/multinomial-ordinal-logistic-regression/real-statistics-functions-multinomial-logistic-regression/

If you send me an Excel file with the data you used (especially the new data that you created) along with the results you achieved, I will try to figure out what is going wrong. You can find my email address at Contact Us.

Charles

Hi Charles!

I’ve re-installed the (Office 2016 – office 365) and I think that the problem can be related to that.

I’ve followed the complete path to use the File and the Option and process of configuring the Trusted Location and everething to fix but I didn’t get that rigth.

I’ve oppened the Excel file and the software can’t find some formulas that are pointed to the RealStats.xlam file, as that formulas “DIAG” on tab “MLogit 2”.

I’ll try to reinstall the Office 365 and than follow the situation… Any other idea?

Thanks again!

Marcos

Marcos,

What do you see when you enter the formula =VER() in any cell?

Charles

Charles,

The message retreieved is #NAME?. It seems that

Excel does not recognize the file RealStat as source of the formulas.

As alternative I’ve installed Office 2013 and made the process of configuration of the file and it worked fine but now I’ve got another error now in Excel sayng there is a problem with supplement file and asking to Disable the file. I’ve download another file but the error es the same.

By the way I’m using Excel in Portuguese.

Thanks again.

Marcos

Charles I think I’ve got the problem. I don’t know why but when I Install the RealStat.xlqm the references to the formulas is pointting to an specific path =’C:\Users\Charles\AppData\Roaming\Microsoft\AddIns\RealStats.xlam’!FTEXT(F6)” So because I don´t have this path the formula doesn’t work. I’ll create another user on my computer to check if it works.

See ya.

Marcos

Hello Sir.

I would like to know if we have post hoc analysis for Scheirer-Ray-Hare Test? Is it okay if I run two-way ANOVA for ranked data and perform Tukey’s HSD Test?

thanks.

Gilbert,

I don’t have a definitive answer for you. If you are performing a post-hoc analysis on the row or column factor, you probably can use a post-hoc test for the Kruskal-Wallis test. I don’t know what approach to use for a post-hoc analysis of the interaction factor, which is why you would want to use the Scheirer-Ray-Hare test in the first place.

Charles

I have the problem in Microsoft Office 2007 of window 7. when I install the realstat package 2007, it asks the ‘solver password and realstat password’ in my excel file. And then,I have ‘compile errors in hidden module, frm Matric’ in using the realstat, add-in. How can I solve the problem? Please!

The usual reason for this is that while you have downloaded the software, you haven’t yet installed it. The installation instructions can be found on the same webpage from where you downloaded the software.

To check whether the software is installed correctly, press Alt-TI and make sure that Solver and RealStats-2007 appear on the list of add-ins with check marks next to them.

Charles

Thank you very much for your informative reply.

Hello

I am now using the Microsoft 2016 in window. I have successfully installed the real statistics add-ins but I can not tap in quick access tool bar and it does not appear when I press the ctrlM and I can not use it.I would like to know how to solve this problems.

Nilar

Nilar,

What do you see when you enter the formula =VER() into any Excel cell?

When you press Alt-TI do you see RealStats and Solver on the list of add-ins with check marks next to them?

Are you pressing Ctrl-m or Ctrl-M ?

Charles

Hi Charles – first, great program. Really appreciate the time and effort you put into it. I just upgraded to Microsoft Office 2016 and after working the first time, am having the same problem. Pressing Ctrl-m or M – nothing happens. When I type in =VER(), what shows up is #NAME?. I’ve confirmed that both RealStats and Solver are in the add-ins with check marks. It also shows up on the options screen as being enabled. Any ideas?

Thanks, Tony

Tony,

This is very strange. If when you press Alt-TI you find that both RealStats and Solver are in the list of add-ins with check marks next to them, then at least =VER() should work. The only thing I can suggest is that you deinstall Real Statistics and start the installation process again.

Charles

Charles – appreciate the quick response. After stumbling around a bit what I got to work was to go to the Data Menu and then the Edit Links sub-menu. When that window opened, I clicked on “Open Source”. Another window then pops up warning about potential threat. Click “OK” on that. That worked. I restarted Excel to see if the settings would stick, but they didn’t and I had to re-do those steps. I’m sure there is away to not have to do it every time, but at least right now I’ve got a workaround. Hopefully this helps others using Office 2016 and have the same problem.

On a separate topic, have you considered adding a Probit Regression option to your Logistic Regression? It is more commonly used in econometrics than Logistic Regression.

Thanks again, Tony

Tony,

Perhaps the following webpage can be helpful:

https://www.excelcampus.com/vba/add-in-ribbon-disappears/

I do plan to add probit regression. It has been on my list for a long time, but something elese always seems to pop up and so I never seem to get to it.

Charles

I would suggest that;

1) you install the “RealStats.xlam” file in a “trusted” location

2) you unblock the “RealStats.xlam” file

the two steps above are well described in: https://www.excelcampus.com/vba/add-in-ribbon-disappears/

In order to to see the addin menu in Excel you should also additionally run Excel as an Administrator otherwise the ctrlM will work but the addin menu will not be on the ribbon

Thanks Paolo. I’ll try it. Appreciate the answer.

Hi Charles – thanks for adding the probit regression. I just had the opportunity to make use of it, so appreciate you adding it to the last release. Many Thanks, Tony

Tony.

Good to see that it was helpful to you.

Charles

Hello Charles,

I am using Excel 2011 in a MacBook computer. I have downloaded the Mac version of Realstats and successfully installed it as an add-in in Excel. After pressing Control+m, the pop-up window appear, but I could not select data range (I could manually write data range and use Fill) in the Input Range to perform a function such as Descriptives. What seems to be the problem with the add-in?

Thanks indeed.

Bo,

No problem. In the Mac version you need to manually enter data ranges. This is not the case for the Windows version.

I hope to improve this in a future version for the Mac.

Charles

Thank you very much for your immediate response. I hope a future version will solve this problem.

Best wishes,

Bo

Is there a way (hopefully easy) that allows for the prediction model for binary logistics regression to be accessed and used by another similar data set ? I am finding if I use from 2-10 independent variables the regression works but the coeff are always placed in a different column so I have to manually input the equation when I need it to be automatically done. Any suggestions would be helpful.

Bill,

Sorry, but I don’t understand the issue that you are raising. Can you please clarify.

Charles

Hello, I would like to know how to solve this problem.

Compile error in hidden module: LogisticRegression.

This error commonly occurs when code is incompatible with the version, platform or architecture of this application.

Before this when I wanted to add real statistics in Add-ins I had mistake “Can’t find project or library”. Then I pressed OK and I received other message “enter password”. After that I see real statistics in Add-ins but it doesn’t work =(

Anastasiya,

You should never need to supply a password. When you press Alt-TI do you see both Solver and RealStats in the list of addins with check marks next to them?

Charles

It works now=) thanks!

Charles,

I am opening an excel file, and when I click on data>data analysis I don’t get any options more than I get from excel. Is there another method to access Realsatats functions?

You need to either press Ctrl-m or use the Addins ribbon.

Charles

Dear Charles; I just downloaded and installed Real-Stat, and I went thru excel-file-options-add-ins, etc. But when I open excel file and click data analysis I still don’t find it. What did I miss?

Ahmed,

Are you trying to open the RealStat file or some other Excel file? You should not try to open the RealStat file. You need to open some other Excel file.

Charles

I have two dimensional river data for two rivers, and on the 2D plot — S vs. C — it is clear that data cloud for one river is distinct from the data cloud for the second river. I’m struggling with the setup for demonstrating statistically significant difference between the two clouds. The two dimensions are independent, the number of data points for each river are different, they are not paired. Is there an appropriate template in your downloads that will work for a plug and chug analysis?

Peter,

If you want to compare the means of these two data clouds you can use the two independent sample t test.

If you want to compare the two distributions, then you can use the two sample Kolmogorov-Smirnov test.

Both of these are described on the Real Statistics website and are included in the Real Statistics software package.

Charles

I don’t see how the t test can be used when there are two dimensions to the data cloud. All the examples test for significant difference between the means of a single variable/measurement. It would be appropriate for determining if there was a significant difference in mean weight between juniors and seniors in high school. But what if you plotted both height and weight of juniors and seniors and wanted to know if the two populations were significantly different in both dimensions combined, i.e. maybe there isn’t significant difference when considering either height or weight independently, but you want to know if both together are distinctive? I was told that MANOVA was appropriate for that scenario, but I’m having a hard time translating the jargon into setting up an analysis for my specific case with river samples in which there are two measurements per sample. It looks kind of like Example 1 of the “MANOVA Basic Concepts” if you eliminate Column D (herbicide) and eliminate two of the soils (salty and clay) and want to know if yield plus water in combination are significantly different between loam and sandy. Am I on the right track?

Thanks for the help. Intuitively, I generally understand what I want out of statistics, but it often feels like a foreign language when it comes to the application of them.

Okay, I think I finally sorted it out with MANOVA.

Thanks for making the tools accessible!

Good to hear, Peter.

Charles

Charles

I am getting this error when installing

Microsoft Excel cannot access the file ‘C:\Users\David Harris\AppData\Roaming\Microsoft\AddIns\RealStats.xlam

The file name or path does not exist

The file is being used by another program

The Workbook you are trying to save has the same name as a currently open workbook

Any suggestion on how to fix this issue

David,

Perhaps you already have RealStats installed and you are trying to install a new version. In this case, you shouldn’t repeat the installation instructions. Simply replace the old file containing RealStats by the new file.

Charles

Hi,

Thank you very much for this helpful site. Well I have some questions my research is about factors affecting students’ pursuit of STEM, what do you think is the best treatment for the data where my DV are from Likert/summated scale and my IV is a nominal data (yes/no if they will or will not pursue STEM).

Is a point biserial ok? (My panel wants me to use correlation.)

or is logistic regression more appropriate?

Thank you in advance! 🙂 🙂

additional question:

is it ok that I only got 20 yes out of 300 respondents?

Sorry, but I don’t have a definite answer for you. Logistic regression may not converge to a solution with such unbalanced data.

Charles

Javin,

It really depends on what sort of questions you are trying to ask, but correlation and logistic regression both might be reasonable approaches. Note that for this sort of data point biserial correlation is equivalent to correlation.

Charles

Thank you 🙂 🙂

Hi,

Thank you very much for your reply back then plus thank you for this free downloadable add ons in excel , it’s a great help. Well I have a question again my panel wants me to use this formula t=(r√n−2)/√( 1−r^2) how do i interpret the result of this?

They also ant me to use one way anova to see the difference between the response of those who sad yes and those who said no but i have unequal sample size, does it matter?

Thank you again! God bless

Javin,

1. The formula formula t=(r√n−2)/√( 1−r^2) is used to test whether the correlation coefficient for a paired sample is significantly different from zero. See the following webpage for more details

http://www.real-statistics.com/correlation/one-sample-hypothesis-testing-correlation/correlation-testing-via-t-test/

2. You can use one-way ANOVA even when the sample sizes are unequal. In such cases, you should make sure that you are not violating the homogeneity of variance assumption since the test is more sensitive to violations of this assumption when the group samples are unequal.

Charles

Hello sir, my question is little different it was asked me by interviewer the question is, “say the statistical reason that why airplane fly 2000 feet upon the earth”

Sorry, but I don’t understand your question. Airplanes don’t fly 2000 feet above the earth, they fly much higher than that.

Charles

Hi Dr. Charlz

I am a PG student carrying out a study on Burden and Coping among care givers of stroke survivors. The question I am about to post may seem naive but its quite confusing for me

The objectives are

1. To assess the burden among caregivers of stroke survivors.

2. To identify the coping strategies among caregivers of stroke survivors.

3. To find the association between burden and coping strategies among caregivers of stroke survivors.

4. To find the association between burden and selected demographic variables among caregivers of stroke survivors.

5. To find the association between coping strategies and selected demographic variables among caregivers of stroke survivors.

Statistician recommended to use Chi square for Objective 4, ANOVA for Objective 5 and Independent t-test for Objective 3. I am doubtful on why ANOVA and t-test for the objectives 5 and 3 , I suggested using chi square.

The coping checklist is categorized as Problem focussed,Emotional focussed and Problem and emotional focussed.

Burden is categorized as No burden,Moderate Burden and Severe Burden

I assume your questions whether to use your approach or the statistician’s approach. In order to answer I would need to know the nature of the data that you are collecting.

Charles

Yes thats the doubt I have in mind

Regarding the data, i have 3 separate tools-2 standardised and 1 for sociodemographic variables

Section A: Socio personal data of the participant

Caregivers socio personal data consisted of 7 items including age, gender, education, marital status, employment status, monthly income, family type.

Section B: Clinical data sheet

Clinical variables consisted of 8 items including relationship with the patient, total caregiving time(hrs/day), duration of care, availability of sub caregivers, family history of stroke, level of dependency, type of disability and duration of illness.

Tool 2: Burden Assessment Scale(BAS) .Instrument consists of 40 items on burden with 3-point Likert scale Scaled not at all , to some extent, very much. The score range from 40-120 . Higher scores indicates greater burden

0-40 – no burden

41-80 – moderate burden

81-120 – severe burden

Tool 3: The Coping Checklist (CCL)

The coping checklist (CCL) has 70 items describing a broad range of behavioral, cognitive and emotional coping responses that an individual might use to handle stress. The responses are scored dichotomously in a yes/no format, indicating whether a coping behavior is present or not. The total no. of items reported by an individual is indicative of the size of coping repertoire. Further refinement of the tool resulted in seven subscales, one of problems focused coping, five of emotion focused coping and one social support subscale, which is a combination of both problems focused and emotion focused coping.

Based on what I have understood, the fact that the coping strategy is a score between 0 and 70 makes me think that ANOVA is the correct tool.

Charles

Hello sir,

Thanks for making this tool available.

Is there a way to work on SARIMA model?

What I see in the add-in is just ARIMA without the seasonal component.

Sorry Emmanuel, but I have not yet added a seasonal component to ARIMA. I do explain seasonality for ordinary linear regression and for ETS models. See the following:

Regression with Seasonality

Holt-Winters Forecasting

Charles

Dear Doctor Zaiontz,

I have two factors, i.e. gender (male/female) and generation (old/mid/ young). The result of two-way anova revealed no significance. Could I conduct planned t-tests? If yes, could I use “contrasts for two factors” on the website to do this?

Thank you for your time!

Rebecca,

Yes, you can use contrasts or Tukey’s HSD. See the following webpage:

http://www.real-statistics.com/two-way-anova/follow-up-analyses-for-two-factor-anova/

Charles

Dear Dr. Zaiontz,

I have one dependent variable (binary categorical) and 7 independent variables; 6 of them are binary categorical and the seventh independent variable is age. I want to use correlation to determine if the independent variables affect the dependent variable (falls). Can I use multiple correlation for this? My adviser wants me to use correlation.

Thank you,

Azuka

Azuka,

Yes, you can use the multiple correlation for this. See the following webpage:

Multiple Correlation – Advanced.

Charles

Hello, I would like to know how to solve this problem.

In a study of 150 male and 200 female internet users, 30% of males and 35% of females said they plan to shop online at least once a month. At 0.10 significant level, is there a difference in proportion of internet user between male and female internet user who plan to shop on line?

Banz,

You could try to use a chi-square independence test, knowing that 45 males shop online and 105 don’t, while 70 females shop online and 130 don’t.

Charles

Dear charles,

Sir i need a Help from you. I am basically a Statistics student.

now presently I am working in a Shrimp feed(fish feed) production company.

So this feed is produced in 5 types generally based on a formula. now we have launched a new formula with a minor change. We have fed this new feeds and previous feeds in different ponds independently. It means the shrimp in 10 ponds are fed with 10 feeds. The growth of shrimp is noted every week and the data for 6 weeks of these growth rates of 10 ponds are collected. Now my task is to analyse and report weather the previous formula of feed is better or the new formula. Sir please help me out how to perform this statistical task in excel.

It sounds like you need to use some form of ANOVA (repeated measures, split-plot, etc.), but it is difficult for me to determine which type of ANOVA model is best based on the limited information that you have supplied.

I suggest that you look at the ANOVA portion of the website to help you determine which is the best form to use.

Charles

Hi Charles,

I have run a few tests using real stats and they worked perfectly. Thankyou! I however keep getting an error when I try to run a repeated measures ANOVA. It says ‘Run-time error ‘1004’ Application defined or object defined error.

Do you know why this might be the case?

Many thanks,

Na

Na,

If you send me an Excel file with your data and analysis I will try to figure out what is going wrong.

You can find my email address at Contact Us.

Charles

Hello Charles, I don’t want to look a gift horse in the mouth, but just wanted to understand your rationale for providing this fantastic resource for free. Apologies if this is written somewhere on the site but I couldn’t find it. Many thanks, I find the articles a very good, clear, and practical introduction to the topics (which tend to be covered, unnecessarily, in a much more elaborate and confusing way elsewhere!).

Jim,

I appreciate your kind remarks about the site. I am always pleased to learn whether I have been successful in providing useful but simple-to-use information with the right combination of utility and explanation.

This might sound corny, but my goal is to give something back to the world. I started out in my career as a research mathematician and professor, but have worked as an executive in business for most of my career. Now that I am semi-retired I wanted to return to mathematics. I am having fun putting together the website and the software resources, and at the same time I feel that I am doing something to help people all around the world. I am also enjoying the challenge of making the website as good as I know how to make it.

My goal is not to make money, but shortly I will begin to publish a series of ebooks (as mentioned on the website) to help defray some of the costs of the site. I also plan to add a donation button for the same reason, but even after almost four years of creating the website, I still haven’t gotten around to adding this.

Charles

The add-in seems to have a problem with Independent variables less than 1 and greater than 0. Why is this?

Chris,

If you are referring to values that you put in the dialog boxes (e.g. for Alpha), then this is due to how decimals are handled in some versions of Excel (comma versus period). I have added a % option with does away with the problem. If you click on the Config button on the main dialog box (the one that appear when you press Ctrl-m) and choose this option, then instead of entering .05 (or ,05) for Alpha, you would enter 5 for Alpha %, thereby eliminating the problem.

If you are referring to some other problem, then please send me an Excel file with the data that you are using. You can send it to the email address in Contact Us.

Charles

Checking the % in the Alpha caused a different issue. The tool is looking for a Classification Cutoff between 0 and 1, inclusive.

My question was about my raw data and if the tool had issues with independent variable values between 0 and 1. I was able to answer my own question when I tested just 2 of my independent variables that have values between 0 and 1. The tool was able to perform the calculations as expected.

However…There seems to be a limit on the number of independent variable values the tool can use to perform a Binary Logistic regression. I have 14 independent variables and after some testing, the tool “chokes” when you set the range to 6 or more independent values. I will send you my data for your review. Thanks.

Hi Again,

Seems I’ve been able to move on past 6 independent variables and now the tool can work all 14. I’m still working on the arrangement of the data in the sheet so I can be confident when the Regression returns results, they are as accurate as possible.

Is there any way to dictate the sorting of the data that is returned in output range? By default, it appears to sort the data in ascending order by the first two variables in my dataset. I would like the data returned in the output range in the same order as it is in the input range.

Chris,

I don’t fully understand your question. If you would like the data returned in the output range in the same order as it is in the input range, then you don’t need to sort at all. Clearly I am missing something.

Also which sorting approach are you referring to (since there are Excel procedures for sorting and various Real Statistics functions).

Charles

______________________

[1] – The Earth is Round (p>.05), Dec. 1994.

If a person . . .

J. Cohen [1] was lead to “prove” that the NHST procedure is wrong do choose the sequence:

__1) If a person is an American (A), then he is probably Member of the Congress, (MC)

__2) The person is Member of the Congress

__3) Therefore he is probably not American.

It is astonishing that an element of a subset (MC) be questioned to belong to its proper set (A), when a MC must be from A, necessarily, without exception.

!!! Do intend the Psychologists took us by idiots stoned? Or they are only kidding?

Luis

Hi! Many thanks for this cool site!

I’ve a little problem: My Excel is 2016.

When ever I activate the real stats add-in, save the file an re-open it I cannot see the add-in tab though the Add-In is shown up as activated. I have once again to deactivate the check box for Real Stats, leave the add-in check box list, go back and check it again. Then the add-In tab is shown up whare I find the real stats menu – but only for the actual session. When I save the excel file and re-open it again the add-In tab has vanished.

Any ideas?

Gabor Törö

Gabor,

This is a problem that has recently appeared due to some security upgrade that Microsoft just made. It affects any Excel addin including Real Statistics. Fortunately one of the users has identified a solution, which you can find at

Blog

Charles

Thank you!

Just tried it. It does not work for me. Well, it is only a bit annoying. Real Stats works. So forget it. Thank you.

Gabor Törö

Hi,

thank you very much for these powerful tools!

Is there a way to put these function into a Macro in a way to be able to repeat the function in different sheets, without have to do it every time in each sheet?

I tried with a normal macro in Excel 2016 and it doesn’t work and I didn’t find info in this website.

Thank you in advance for the answer!

Have a nice day!

Asnicar,

How to do this is explained on the following webpage>

Calling Real Statistics Functions in VBA

Charles

Hi

This really is a fantastic resource.

I have produced an employee engagement survey using Likert scales.

I have averaged a group of Likert responses to form an overall engagement score.

I have different engagement scores for different population groups eg from staff at different offices, staff with different job titles etc. I want to know if these differences are significantly different from each other.

I only have a basic understanding of statistics so if you’re able to respond as you would to a newbie it would be much appreciated.

Thanks and much appreciated.

Richard,

It really depends on the details, but you might want to start by looking at one-way ANOVA. See

One-way ANOVA

Charles

Apreciado Dr. Charles, me gustaria saber si hay una nueva version del paquete Realeal Statistics, dado que ultimamente he tenido que instalarlo por add ins cada vez que voy a usarle en excel.

Muchas gracias

Dear Dr. Charles, I wonder if there’s a new version of the package Real Statistics, since lately I had to install it add ins every time I’m going to use this one in Excel.

Thank you very much

Gerardo,

I expect to issue a new version sometime before the end of July.

Charles

Thank very much Sr.

Hi, Im a professor at Del Valle University in Guatemala City. I want my students to use Real Statistics in Statistics 101, which is a course that I teach. I have never had a problem installing the software however some of my students have been having some problems installing it. How can I contact you guys for further assistance? By the way, your program rocks! Thank you for doing it.

Gabriella,

Thanks for your kind remarks. You can contact us by going to Contact Us.

Charles

Hi Charles and Gabriella,

Microsoft has installed lots of Microsoft Office patches on 15th July. After that, I have run into the same problem. I can’t get Real-Statistics up and running through any of the methods. Charles, if you happen to find a solution, please do post it 🙂

Great tool! Hope to be able to get it back up and running again.

Best,

Wolfgang

Wolfgang,

I have had no problems running the Real Statistics software on my computer. What sort of error messages are you seeing?

Charles

Hi Charles, I have figured out the solution. All I had to do is go to Developer-AddIns, click on RealStats and ok. Excel asked me if I want to reinstall/overwrite, confirm, and it’s working again. Just in case anyone else encounters the same problem.

Let me just say that your tool is absolutely great; I do ALL my data analysis with it (well, with the exception of SEM).

Best, Wolfgang

Wolfgang,

Thanks for sharing this.

Glad to see that the software is useful for you. Eventually I will also add SEM support.

Charles

Hello I’m trying help my son who’s a pre-med student doing a summer research internship at a local hospital. We’re struggling to figure out which statistical test(s) are best for the given data collected. Is there some way to post a question regarding his data, to maybe get some suggestions from Statistic-minded people? I realize this is asking a lot, but figured I’d at least throw it out there, so maybe someone could make a useful suggestion for us. Btw we’ve been experimenting with the tools provided here and everything works great! It’s just that again, we’re not sure we’re asking the right question for the various stat’s tests. If that makes any sense.

Thank you!

Scott,

Please describe the hypothesis that you are trying to test and the nature of the data.

Charles

Apreciado Dr. Zaionits, reciba un cordial saludo, Dr excuseme una pregunta, si yo hago una tranformación de Box Cox Normal para mis datos, como calculo la inversa de la tranformación para interpretarlos?

Dear Dr. Zaionits, Yours sincerely, Dr Excuse me a question, if I make a transformation of Box Cox Normal for my data, as I do calculating the inverse of the transformation in order to interpret?

Great Work……..We are thankful to you.

NICE Charles!

Thank you for all the recent functions in the updates. It saves me a lot of time to perform forecasting.

Every time i download an updated version I am surprised by all the features in the AddIn. Probably the most comprehensive Excel AddIn for stats I’ve ever come across.

Y’all have a good one!

Marc,

Thanks for your comment. I am pleased that the software is saving you time and helping you with your activities.

I add more features all the time, although the output may slow a bit during the summer.

Charles

Does your product work with Excel 365? (windows)

Alex,

My understanding is that it works on the desktop version of Excel 365.

Charles

I’ve been doing scientific research for quite some time and some of the high level statistics used to always impede things (using python or matlab) and take away the fun of the general picture.

I have to say this is the best, most intuitive, and straight forward statistics analysis pack I know. Your explanations are also very good, not too over my head, but also not too dumbed down.

Thank you and keep up the good work!

Dear Charles

I can only echo the comments thanking you for this resource

Your clarity and examples have revealed the mystery of statistics to a less than numerate student ‘of a certain age’

For me the best advantage over ‘black box’ stats packages is that you demonstrate very clearly the relationship between the data and the results and the processes which generate those results

Kindest of regards

Rod

Rod,

Thank you very much for your kind remarks. I am trying hard to make the subject accessible to a wide audience.

Charles

Hi, can someone help me? I am trying to use the Jenks Natural Breaks option. It kept freezing. So, I reinstalled, but now I can’t find the Multivariate Analysis option on the menu. It worked before beautifully, but now it seems to be gone. Any help is much appreciated. Tom

Thomas,

The Multivariate Analysis option is still supported by the software. Which version of the software are you using? You can enter the formula =VER() to find out.

Charles

Thank you, Charles. I’ve reinstalled. I’m using 2013, and I now see the option. But, it seems to freeze when I try to use Jenks. Is it supposed to process for a long time? Any advice is most appreciated.

Tom

Tom,

If you leave the Number of Iterations field blank then the algorithm will evaluate all possible partitions, which depending on the size of your data could take a long time. You can specify the number of iterations to reduce the processing time. See the following webpage for more infor:

Jenks Natural Breaks

Charles

Yes, I started to think that. I only would like 5 classes for 253 rows of two variables. The instructions seem to imply leaving it blank. It’s been processing for about 15 minutes now. Is there a best practice? Maybe 100K, how long would that wait? Should I just wait for this to finish?

Thomas,

15 minutes is a long time. Leaving the field blank means that you will get the best answer, while if you put in a value you might not get the best value, but you may get a good-enough value. In any case, I would suggest that you put in some low value, say 1,000, and see how long it takes. If not too long increase the number until it looks like it is taking too long.

Charles

Hi Charles,

I stumbled upon this site: http://www.statisticsmentor.com/ … i wonder if this is also yours.

Best,

Kris

Kristina, no it is not my site.

Charles

Thanks for your attribution!

I am using it as Add-in for Excel. But in “k-means Clustering Analysis”, I get different results with the same data set after 10 times repeat.

Can you tell me what the reason is?

I’m sorry to bother you.

The problem was solved. There was some trouble with my computer. ^^

Hello Charles,

I just came across your powerful Excel site and I can say that I have been richly blessed. I am an online student of one of the US universities studying quantitative finance.

Thank you for your ingenuity in making this freely available.

Hello Charles,

Thanks for putting this online, but I’m already stuck in the installation process (maybe because I’m on the corporate network). I’ve copied the .xlam (for Office 2007) to my “C:\Users\username\AppData\Roaming\Microsoft\Invoegtoepassingen\” (Invoegtoepassingen=AddIns in Dutch), made sure the Solver add in is loaded but it tells me it has no access to the file.

I made sure it’s there (by browsing to check if its really there and not deleted or what). All rights seem okay (SYSTEM and my user have full control except “Special permissions”) and also restarting Excel didn’t help. Suggestions? Thanks!

Ben,

I don’t know why you are getting this message. The following comment from a user who uses the name “Cyberpreneur” may be helpful

“to correct the password problem requires installing the Real Stat addin in proper manner. Copy the downloaded file to /Microsoft Office/Office12/Library/Analysis , then restart excel. Then install in normal way.”

I don’t know whether this will work, but it might.

Charles

Wonderful resource! Thank you for freely providing this add-in and all your tutorials.

Many thanks for providing this great addition to Excel!

Sir,I have installed it but what is the password ?

You don’t need a password, but you do need to follow the instructions for how to install the resource pack. These instructions are listed on the same webpage from which you downloaded the resource pack.

Charles

Thanks for this program! I have upgraded from Excel 2010 to Excel 2016. I have updated the Real States to ver 4.6 in the same location as the older version (C:\Users\user-name\AppData\Roaming\Microsoft\AddIns).

I use realstats for KS 2-sample tests and really like it!

Realstats is asking for a password, and excel is throwing a “solver error” . Then excel crashes. Any ideas about what might be going on? Thanks.

Dave,

Glad to see that you are using the Real Statistics software.

The latest version of the software uses Excel’s Solver, and so you need to make sure that Excel’s Solver is installed. You do this by pressing Alt-TI (i.e. hold down the Alt key and press T and I) and making sure that there is a check mark next to Solver in the dialog box that appears. Then close Excel and reopen Excel.

Charles

Thank you for the prompt response.

I have solved the issue. You cannot check/install at the same time the Solver, RealStats, and Data Analysis AddIns. If you do, then there are problems. If you install the Solver, then make sure it’s working, then RealStats everything appears to run smooth.

Dave,

Thanks for the clarification. I will pass this on to others.

Charles

I am running El Capitan 10.11.3 on a Macbook Pro Retina. I have Office 2016. I installed this software and then deleted it, because it did not have some capabilities that the Windows version has. The problem is that the add-in still appears in my “available add-ins” in Excel, even though I deleted the program. Does this program have an uninstaller? How do I get rid of it so it does not show up under my “available add-ins.”? Should I reinstall it, then “uninstall” it as opposed to moving it to trash like before? Thank you for any help.

It probably doesn’t harm anything by doing nothing, but you should be able to “uninstall” it via the following steps:

1. From the Tools menu choose Add-Ins.

2. Uncheck the Realstats 2011 option on the dialog box that appears and click the OK button.

Charles

Hello,

i downloaded the booklet from syncfusion and i noticed an error on page 23, where it says:

“…such that (1+????)^4= 1.334. Thus, ???? is equal to .334(1/4) = .0747”

That is wrong. The correct procedure to find ???? is first to calculate (1.334)^(1/4) and then subtract 1: r=(1.334)^(1/4) – 1.

Also, .334(1/4) = 0.760, very far from 0.0747

Thanks Stefano for finding this error. I will try to get the publishers to correct the error.

Charles

I work in finance and regularly do statistical work, largely involving regression and correlation. In particular, I have used various tools, including importing datasets into R in order to get RSE regressions in order to satisfy regulators when doing loan loss analysis. What I don’t know is how I didn’t know of your site and tools earlier. Let me just say that you are my personal hero. Thank you so much for your articles and work. Do you have an area or way to accept contributions? I’d be happy to contribute something for this pack of awesomeness.

Thank you very much Mel for your kind words about the website.

My philosophy has been to provide all the statistical capabilities free of charge (except for some upcoming books which I will sell for some minimal fee). I want people to have access to these capabilities without having to pay. I am considering adding a voluntary donation capability to cover some of my costs, but so far I have focused on adding new capabilities to the site and software.

Charles

Really great plugin – was going to do some of the most basic stuff myself (e.g. Shapiro-Wilk and frequency reports), but this saved me loads of time and i’d never be able to do the advanced stuff you have.

ONE REQUEST….

Could you please make the “input range” used on most of your forms default to the current selection. This would be really great!

keep up the amazing work.

Graham,

Good to read that the plugin has helped you.

Please explain why making the input range default to the current selection would be helpful. Currently I have the output default to the current cell. What would you have me do when there are two input ranges?

Charles

Hi, here is my case, given purely with the intention for you to decide on the merits of my suggestion.

.

I’ll present two workflows to illustrate – my start position is my worksheet with the data (which I invariable have as a ListObject) and my selected cell is within my data (not an unreasonable assumption, as ive probably been data munging it).

Note:I’ve added in defaulting the output to a new worksheet, which I didnt mention previously, but which makes the most sense as a default under this scenario (microsoft defaults to that for inserting a pivot table for eg).

Scenario 1 – First input range defaults to selection

1) select a cell in my column of interest

2) Press ctrl-space to select all column data (or ctrl-shift-space for entire row)

3) select your test

4) select additional range if required (e.g, for correlation) – (manual or user range selection – but see optional enhancement suggestion)

5) Have default output to new worksheet, enter cell in current sheet (ie type 2 character, e.g L2,L4 etc)

6) Run test.

Optional enhancement – where a test requires two ranges, use range of selection.columns(1) and selection.columns(2) – these defaults would not mean any extra user clicks should they wish to override these as the default.

Advantages:

a) The user range browser (Application.inputbox(,,Type:=8)?) does not support ctrl-space shortcut – on large sheets you have to jump to start of worksheet to use the ctrl-shift-down cursor option to select entire col range (or row range).

b) If i’m typing in the range manually I don’t have to remember my data range if I cant see the start and end of it, or if I have chosen a new worksheet to paste my data

c) Defaulting to new worksheet is probably the preferred user option – guarentees do not overwrite data (appreciate you have the check which is v. good). Mixed data on my main data spreadsheet breaks data imports SPSS, SigmaPlot, SQL Server … etc etc … (Extra data leads to misdetection of datatypes etc)

e) For single vector test I do not have any additional key presses – click ok and its done. For 2 range tests, one range is already complete. (but see optional suggestion above)

Disadvantage

a) Unclear if majority prefer defaulting test output to current worksheet. If they do though, then all they have to do is type in a simple single cell reference.

Current scenario

1) Review my worksheet and make a note of my input range(s) [optional] – may involve scrolling and sheet navigation on large dataset.

2) Select a cell where I want the output (or click the new option)

3) select your test

4) manually enter first input range from (1) or use the user range selection tool.

5) manually enter second input range from (1) or use the user range selection tool.

6) Click new for new worksheet (optional)

7) Run test.

Hope you dont take this as a criticism, its merely well intentioned feedback.

all the best,

Graham

Under the scenario of doing different tests on the same range, defaulting the input box to current selection is better as well.

Graham,

Thank you very much for your suggestion and your description. I am thinking of a way to offer the user a choice between defaulting the input or the output. I still need to figure out what is the most intuitive approach.

For now, as long as you highlight the first row of the input range you can use the Fill button to highlight the rest of the range.

Charles

Hello,

I am conducting a survey with a sampling of about 800. The survey has four categories as dependent variables. A few of my hypotheses range from comparing gender (IV) to a particular category (dv). The other hypotheses compare education, player experience and coaching experience (IV) to the total of all categories.

I used t test paired samples with my pilot test with only 9 samples….but I was looking at the MANOVA and was wondering if this was a better data analysis. And if so, is MANOVA easy to do on your excel program?

Deanna,

If you have multiple dependent variables that are correlated with each other, then MANOVA may be a better choice.

The Real Statistics Resource Pack does support MANOVA and there is a MANOVA data analysis tool in the resource pack that should be relatively easy to use.

See MANOVA for more details.

Charles

Dear Charles,

Does your software package include the ability to apply statistical process control analysis to data, specifically as developed by Dr Walter Shewhart?

I ask because only Shewhart’s method adjusts sigma limits to allow for handling real time data as a basis for predicting if a system is stable or unstable with the limits calculated from the data collected. Dr Shewhart also worked very closely with Edward Deming to help him develop his method of continual quality improvement.

I mention this because I want to apply this spc software prospectively to make inductive decisions in real time not deductive decisions based on traditional statistical analysis.

Best wishes

Terry Fox

Terry,

I don’t support Shewhart’s statistical process control analysis, although I will add this to the list of possible future enhancements.

Charles

Dear Dr.

have one problem in use the CHI square test.

the valor alpha must be a number between 0 and .5

as I solve this?

best regards

Carlos Adriano

Carlos,

I don’t know you would want to use a value outside the range 0 to .5, but in this case, you can choose any valid number, say .05, run the data analysis tool and then change the value of alpha in the output and the output will then be revised to use whatever value of alpha that you used.

Charles

Found my crib notes—no problem !

JMB

Dear Charles,

I am trying ti use the Scheirer Ray Hare test. I succeed un testing with your exemple on the web site.

Nevertheless, in you exemple with 3 fertilizer, number of rows by sample is the same (5). How can i do when i have 3 samples with different numbers of subjetcs?

Thank you for your help!

François

François,

Caution: There is an error in the Scheirer Ray Hare test described on the website and from the data analysis tool. This will be corrected in the next release of the Real Statistics Resource Pack.

The Scheirer Ray Hare test requires that the group sizes be the same.

Charles

Has anyone tried this with Excel 2016 yet? I’m downloading Office 2016 now and concerned that RealStats won’t work… Charles, or anyone else, if you can share experiences or tips on using this with Excel 2016 I’d appreciate it.

Update: So far so good – RealStats opens fine, and initial tests seem to yield expected results. Will upate again if I find any problems.

Others have told me that Real Statistics works fine with Excel 2016.

Charles

I appreciate your comment

Dear Dr Zaiontz, Yours faithfully. Of like you, I’m using this spectacular statistical tool developed by you and your collaborators in order to dictate my statistics classes and Biostatistics, thus are increasing the number of followers to your page. Dr. respectfully I would like to know what the last version has the Real statistics, I have 4.2 version; if I already came a new and updates implemented.

Thank you

Dear Charles,

Thank you for your guidance.

I have a problem in sample size for running MANOVA. I have an independent variable and five dependent variables. I wonder whether 20 is enough. I really need your help.

Mina,

I don’t see how you can run MANOVA with only one independent variable.

Assuming that you have at least two independent variables, the answer depends on what sort of statistical power you need. This will depend on the expected effect size. In any case, 20 is a pretty low number to expect much power from the statistical test.

Charles

Dear Charles –

I’m trying to cite your website for a paper I have written. Would you mind providing a complete citation for your website? I want to be sure that I have specified it correctly.

Thank you,

Karen

Hi Karen,

You can get the recommended citation information on the webpage Citation.

Charles

Hello,

I downloaded the R-stat but cluster analysis is not showing in the add-in table. What seems to be the problem?

Thanks,

Liliya

Hello Liliya,

Choose the Multivariate option and you will see Cluster Analysis on the dialog box that appears.

Charles

Dear Charles,

Great website, thank you very much. Do you cover Data Envelopment Analysis in your website? Please send me the link if you do, because I can’t find it and it does not appear with a simple search over your website.

Thank you and regards,

DBM

DBM,

Sorry, but I don’t cover DEA at this time.

Charles

Check out http://deafrontier.net/. They have a free version of DEAFrontier, which is an Excel add-in to do DEA. Of course, the free version has limitations on DMUs, inputs, and outputs.

Dr Charles, good afternoon, please do not not work properly because Cohen’s kappa, which molestrale worth, I attached the file so please tell me I’m doing wrong, I use the tool, and sometimes it works and sometimes not.

Thank you very much, and again I apologize for the inconvenience.

Dr I offer apologies and I found what my error.

Thank you

GAD

Thanks for this toolpack, Charles.

I was excited to see Fleiss’ Kappa in the Reliability section. However, after setting up my data, I ran the test and got the error #N/A. I looked at the output and see that KAPPA is listed in the output cells as the array function (e.g. =KAPPA(B3:Q25,0,TRUE,B30,B31) ). Searching the function terms in Excel fails to identify a function called KAPPA. Is this the reason for the output failing? I am using Excel 2010 (windows based).

thanks for any help.

Mark,

Excel identifies KAPPA as a user-defined function.

You can also access Fleiss’ Kappa via Real Statistics’ Reliability data analysis tool (press Ctrl-m to access this option).

If you send me an Excel file with your data I will try to figure out why you are getting an error output. See Contact Us for the email address.

Charles

This is a superb website for newbies. So much invaluable information..Thank you !!

Is there an intention to add a time series section ?

Keep up the good work.

I intend to add time series later this year.

Charles

Hello!

Thank you for putting this package together. It is very useful!

I was wondering if this is also compatible with Excel 2016? And if so, are the download instructions different for Excel 2016?

Thank you!

Alyssa

Alyssa,

I have not tested the software with Excel 2016, but I expect that it will work using the same download instructions. I plan to do some testing shortly.

Charles

Hi Charles,

Thanks so much for your recent help. I’ve learned a lot from your website and your suggestions for me last couple of times.

Nonetheless, I am still stuck with my problem these days. I have nine groups of data each containing millions of numbers. Obviously it does not work for SPSS or excel to compare these nine groups of data simultaneously. So I chose another way: calculated means of these nine groups and compared these means. Could you please give any suggestion that compare these nine means to see whether they are significantly different?

Thank you so much, Charles.

Best,

Stacey

Stacey,

I don’t know of any test that can tell whether 9 means are significantly different from each other, without looking at the samples from where they came from.

Charles

Got it, thanks a lot!

Dear Dr Zaiontz, Yours faithfully.

Dr. is possible to conduct a canonical correlation analysis with Real Statistics? How could it?

Thank you

GAD

Not yet. I plan to implement this capability shortly.

Charles

Dr very kind thank you very much

Dear Dr Zaiontz, Yours faithfully.

Dr. respectfully I would like to know the possibility that Bland Altman diagrams is added in relability Real Statisctis module is that it is not difficult and can be made directly in Excel, but as I am a professor of Biostatistics and as rates of kappa, and of Dhalberg and Houston, they are also widely used these diagrams quality measurements. No additional programming in Real Statistics so that we can support in his excellent management and statistical support for us and researchers, which greatly facilitates the pedagogy using Real statisctics.

Also in the module Chi square the implemetation of Fischer’s exact test, odds ratio and relative risk.

Thank you

GAD

I already have Bland Altman in my list of suggested enhancements. I expect to add this capability in one of the next releases.

I will look into adding the odds ratio and relative risk to Fisher’s exact test.

Charles

Thank you

GAD

Dear Dr Zaiontz

I have just installed Real Statitics in my MacBookPro and I can not find the option for obtaining Bland Altman diagrams.

I have seen that option in some tutorial on the web.

Is it not available for Mac users yey?

How can I find it?

Thanks a lot

Alejandro,

Sorry, but this is not yet available on the Mac version. I am in the process of creating a new Mac version which will include this capability.

Charles

Hi Charles, great product. Is there a way to increase the font size of the text in the pop up box?

Hi Dan,

I haven’t found a way for the user to increase the font size. Are you using the Mac version of the software? I know this is a problem for Mac users. I plan to increase the font size for mac users in the next Mac release.

Charles

Respected sir,

Kindly help me to solve the below mentioned problem

Three factors: Irrigation levels (3), Fertilizers levels (3) and split applications of fertilizers (2)

Main plot: Irrigation (3)

Subplot: fertilizer (3) x split application of fertilizers (2)

= 3 x (3×2) replicated thrice (3)

Thanking you

You described the premise, but have not described the problem you are trying to solve. What is the hypothesis that you are trying to test?

Charles

Hello Charles,

For someone not very well versed in the word of statistics, your website has been a godsend! I was wondering if you’d be able to point me in the direction as to which test I ought to perform on my data as every time I think I’ve found something suitable I read something that suggests it will not be appropriate!

I conducted a listening test in which a participant was asked to adjust the levels of seven different stimuli such that it gives them the same perception of annoyance as a reference stimuli. I had 15 participants in total, all tested individually (not as a group).

I’m trying to find out how combined noise characteristics increase annoyance in comparison to individual noise characteristics (i.e does the annoyance from noise with a tonal and impulsive component combined equal the annoyance from the noise with a tonal component plus the annoyance from the noise with an impulsive component).

My problem is that I’m now dealing with results that have a non-normal distribution as well as unequal variance. I thought maybe the Kruskal-Wallis test would be a good starting point, but one of the assumptions is that groups have the same distribution, and since the standard deviation of my means differ that probably means that I can’t really use it?

Hello Chris,

Thanks for your kind words about the website. I really appreciate it.

With data that is non-normally distributed and with unequal variances, Welch’s Test is probably the way to go. See the webpage

Welch’s Test

Keep in mind though that ANOVA works pretty well even with non-normal data, provided the data is reasonably symmetric. Non-constant homogeneous variances is a bigger problem, but even here as long as the variances are not too unequal (see Levene’s test), ANOVA should work pretty well.

Charles

Hello,

Thank you very much for your efforts!

I hope you could help me out with my problem.

Null hypothesis : Rural average premium charged by an insurance company = Urban average premium charged by the same insurance company

I had t-test in my mind to check out for equality of means.

I used Real Stat to perform ‘Shapiro-Wilk Test’ and I infered that my data fails to adhere normality.

So now I will have to opt for a non-parametric test!

Will ‘Mann–Whitney U’ test be appropriate here?

Thank you for your time.

Yes, if the two samples are independent. If not, Wilcoxon’s Signed Ranks test is probably the right one.

Charles

if i have a rainfall below for example and i wants to find the rainfall trend to the flood area how do i go above it . And is it anova and linear regression i will apply and how

RAINFALL (mm)

YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

1990 10.6 4.9 Trace 131.8 84.3 446.3 532.8 33.3 274.8 169.4 94.3 62

1991 31 20.4 111.8 264.2 212.4 389.7 358.1 45.2 269.7 84.7 2.7 0

1992 0 0 22.4 51.5 139.7 270.3 202.9 22.9 246.5 77.6 30.2 13.7

1993 0.7 27.1 28.1 87.5 171.3 280 91.4 26.5 96.6 127.2 108.6 19.4

1994 18.6 14.1 79.3 43.4 161.7 209.3 34.7 39.1 241 212.3 20.5 8

1995 0 64 63.7 97.9 146.8 343.1 108.6 139.3 149.9 197.9 48.3 4.7

1996 4.8 25.4 109.8 135.2 293.3 599.5 401.3 206.3 135.4 166.6 36.5 10.2

1997 1.2 0 170.2 227.5 348.4 454.6 145 31.7 221.5 183.7 65.3 14.3

1998 8.2 7.7 43.1 87.1 106.4 48.8 30.4 16.5 55.5 236 33.7 16.7

1999 29.1 113.9 364.5 92.3 250.7 334 109.1 161.2 183.1 141.8 5.9

2000 6.2 0.6 9 80.7 181.9 198.9 126.7 98.3 444.1 120.2 4.5 2.2

2001 46.8 14.1 74.7 216.5 111.8 271.2 176.8 31.3 174.2 154.6 70.6 78.3

2002 64.6 44 76.5 159.5 221.9 372.1 296 85.3 189.1 245.8 30.4 49.2

2003 74.8 16.7 17 244.1 208.6 383 101.7 34 160.5 139.9 152.5 0.6

2004 142.3 86.6 58.8 177.4 310.5 709.6 64.9 44.1 387.1 233.1 92.9 13.8

2005 trace 151.3 90 246.9 336.6 526.7 461.7 48.5 122.2 134.4 26.5 4.5

2006 35.2 13.7 51.6 84 283.7 352.6 95.3 43.9 404.9 15.5 51.8 0

2007 0 8.4 25.5 43.6 167.3 569.6 166.5 185.3 101.3 79.1 19.5 8.9

2008 4.6 61.1 48.9 86 156.4 165.1 47.7 2.8 123.4 335.9 4.7 0

2009 0.6 66.1 154.6 39.8 224.4 746.6 159.4 478.1 369.1 141 30.1 Trace

2010 50.8 40.4 45.6 171.6 233.2 444.1 149.6 161.7 240.3 134.3 65.7

2011 0 67 7.8 114.8 317.4 317.1 463.3 86.9 217.9 376.1 54.2 2.6

2012 1.1 61.2 28.7 161 221.1 476.7 250.2 10.4 184.5 229 71.8 8.8

This looks like a seasonal analysis problem, which you can view as a form of regression. I plan to address these sort of analyses in one of the next few releases. Also Excel 2016 has some new capabilities in this area.

Charles

Hello Dr. Zaiontz,

First, thank you for the user-friendly documentation for applying statistical methods in MS Excel. Often more theoretical explanations of statistical operations are less useful when results are needed quickly. Your Excel-based explanations are easy to implement.

I have used some of the Excel procedures from this site to test the normality of a set of groundwater level residuals from a regional groundwater model I’ve developed for the in the lower Apalachicola-Chattahoochee-Flint River Basin for the U.S. Geological Survey. What is proper way to cite your website in a report I am writing to document the model?

Thanks,

Elliott Jones

USGS Hydrologist

Hello Elliott,

I am very pleased to see that the site has been useful to you. You can cite the website as described on the webpage

Citation.

Charles

Hi, great site and great Resource Pack!

Any plans for a Resource Pack update to go deeper into Time Series statistics,

especially unit root tests for stationarity like the Augmented Dickey-Fuller Test?

Thanks!

Daniel

Daniel,

Yes. Starting with the next release, I am adding capabilities especially focused on econometric topics. I hope to get to time series analysis in the release after next.

Charles

I am trying to assess the performance of a procedure in identifying something abnormal in USA cities. I tested this procedure on six different cities with different positive and negative prevalence for each city. I’ve already finished the step of constructing the ROC curve and the nonparametric AUC for each of the six cities. I am now trying to consider that those six cities as a sample taken from all USA cities. Therefore, I need to express the AUC for the procedure in terms of its confidence level. In this case, I have six values for AUC (n=6) and would like to use 95% confidence.The question now is: should I use the conventional t-distribution (for n<30) to calculate the intervals of AUC, or the James Hanley method?

If I am using Hanley method, I need to calculate the standard error for each case of the six cities since the neg. and pos. is different, then I will end up with six different values for the standard error. If using the conventional t-distribution, one standard error value will be calculated, which is supposed to. Please help

Ali,

Sorry, but I am not familiar with James Hanley’s method. In any case, most likely each method would have some advantages and some shortcomings, and so you would have to choose based on what you are trying to demonstrate.

Charles

Hello – I am looking for a joint PDF estimator. I have sparse data for two joint events from which I can build a joint (or conditional) probability matrix. I would like to find the parametric or non-parametric multivariate distribution which fit the data the best. Can this be done using your toolkit?

Sorry, but the Real Statistics software doesn’t do this yet.

Charles

thank you for your help. you are fantastic

Dear Charles, I have a question about comaparing two samples of rainfall data (monthly maxima rainfall intensity for the periods about 10 years). Because I don’t have the required sample size for calculating IDF curves with current rainfall data, the plan is to join data from two measuring stations. I want to check if the data can be merged. The difference in means an variances between the samples are important in this case, so I plan to use t-test an F-test to verify matching of the data. I also plan to use Kolmogorov-Smirnov test to check if the data of each sample is distributed normally.

Is this the appropriate approach?

thank you

Tadej

You can use the two sample Kolmogorov-Smirnov test to check whether the two samples come from the same distribution, which would give me some evidence that the samples can be merged.

If you need to check for normality I wouldn’t use the Kolmogorov-Smirnov test, but the Shapiro-Wilk test instead.

These tests are described elsewhere on the website.

Charles

thank you

Tadej

What about comparing three groups of data with different size (n1=70, n2= 200, n3= 270)? Is one factor ANOVA proper approach, and would you advise any additional test from real stat add-in?

You can perform one factor ANOVA even with groups whose sizes are different. The main things to note are:

1. With such unequal sample sizes, you need to be more careful about the homogeneity of variance assumption (since the test is less robust to departures from this assumption in models with unequal sized samples).

2. The power of the test will tend to be determined by the group with the smallest sample size

Charles

If I may suggest, put most recent comments at top of page. When I first came to this page I almost left instantly because comments were from 2013.

Rob,

Excellent suggestion. I have just made the change. Please let me know if now you don’t see the newest comments first.

Charles

Great site and I really appreciate the resource pack provided. I use it all the time. I have a question. I compared the results from the real statistics “T Test: Two Paired Samples” to excel’s “t-test: paired sample for means” and the P values differ slightly between them. Why is that? Am I doing something wrong.

Many Thanks

Charles

Found it. There were some merged cells in the column labels that was causing the difference. I unmerged the cells and put the labels on the row immediately above the data and re ran the comparisons and they are identical now. Sorry for the mix up. I truly love the way your data is presented. It is far better than excels. Just sayin.

Many Thanks

Charles

Hi Charles, THANK YOU so much for this website!

It really helped me.

I am using in my classes of Bioestatiscs and Statistiscs, I use Too R, but it is very eassy.

I am writing one Bioestatisct Book, can i use your website in order to do the examples?

Thank you

Gerardo,

I am very pleased that my site has helped you.

I am not sure what you mean by “can i use your website in order to do the examples?” If you want to use some of the examples from the website in your book, please send me an email so that we can discuss this further (my email address is found on the Contact Us webpage). I of course have no problem with you using examples from my website in your class.

Charles

Dear Charles,

I am a novice in a real statistics and in English as well.

After visit your website I hope you can help and suggestion.

I conducted experimental research and random selected to study group and control group. Test the characteristic difference between two group is Ok. Then I collected data at baseline, 4 weeks, and 8 weeks after intervention. After running repeated ANOVA it’s show that there are difference between the two groups. My problem is there are also significant between two group in the baseline measure. what can I do next?

That’s interesting. Was the assignments of subjects to the control and study groups done at random? If so, then it is quite surprising that you are finding a significant difference in the baseline measure. How big is your sample?

Charles

The random assignment was done by my research assistant. There are 25 in an experimental group and 16 for a control group.

Hi Charles, i have a question that i cannot seem to answer.

Scenario: i am unable to determine normality of the data, and i am unable to do transformation.

my p value for levene’s test is less than 0.05, which means that the variances are unequal. If this is the case, will i be able to still use the ANOVA test?

and if i am, am i able to use both scheffe method/dunnett T test as post hoc test?

This is an assignment worksheet, based on data that is given to us, we would need to interpret the output Hope you can help

Dawn,

I have the following suggestions:

Please look at the various techniques described on the webpage Testing for Normality regarding how to determine whether the data is normal

If Levene’s test fails, then you should consider using Welch’s test instead of Anova.

The specific follow-up test will depend on what you are trying to test, but Games-Howell is a likely consideration (see Unplanned Comparisons).

Charles

Hi Charles,

I would like to now if there is a version of your add in for Macs

Thank you

Ana

Ana,

Not yet. I need to put out a new version of the software for Macs, but I don’t own a Mac and so need to borrow one to create and test the release. I hope to have access to one soon.

Charles

Hi,

Thank you very much for posting this add in, it is really helpfull. Right now I am trying to make a cluster analysis that I see that you have it in your add in, but I get no results. The function “clust” that you use to assign a cluster to a data point gives no value at all. I was searching some explanation to this analysis in the website but couldn´t find anything at all. Please let me know if you can help me with this issue. Thank you very much,

Santiago

Santiago,

The cluster analysis capabilities haven’t been officially released yet. I expect to release it tomorrow (Friday). Stay tuned.

Charles

Santiago,

Are you able to use the cluster analysis capabilities in the latest release of the software?

Charles

Just came to share my thanks for this free addition to excel! It’s excellent to do quick normality test (Shapiro-Wilk) without all the complex manipulation. Congrats!

And yes, your website is amazing, very well explained.

Thanks again!

Thank you for you answer!

In encoding continuous variables like income, consumption etc. Is it better to use brackets and choosing their medians, brackets with respective codes like 1= <40000, or exact amount?

Usually I prefer to use the exact amount, but answer also depends on the type of test I need to perform.

Charles

have you ever created an excel spreadsheet that will predict the outcome of a game in any sport such as college or pro football, baseball or basketball.

If so can you post it

Sam,

I haven’t, but perhaps someone in the community has.

Charles

has anyone in the community put together an excel spreadsheet to predict the outcome of sports events that they would be willing to share

thanks sam

Excellently done, Charles!

Never seen such practical AND comprehensive statistics’ basics! And a cherry on top – Excel implementation.

Thanks a lot!

sir, i have to find, any correlation between ‘time of income’ (like daily, weekly etc.) and food supply from PDS (public distribution system) i have following coding for this.

time of income is if daily-1, weakly-2, fortnight-3, monthly-4, Binnual-5, triannual-6, annual-7

and

Do you get sufficient food grains from PDS if Yes-1, No-2

sir i have 1 question for you which statistical tool is useful for

X 1 2 3 4 2 1 2 3 1 4 2 3

Y 1 2 1 2 2 1 2 1 2 2 1 2

this kind of data. it’s actually a coding data of ‘Yes’ ‘No’ type question. please guide me sir.

You haven’t provided me with enough information to answer your question. What is it that you are trying to demonstrate or test?

Charles

Hi Charles, THANK YOU so much for this website! 🙂

It really helped me.

Hi,

I am curious if there is way in hypothesis testing that allows me to establish a relationship, such that the scores of the independent variables are measured on a likert scale and values of the dependent variable is dichotomous (e.g. Yes and No, True and False). The sample does not have equal variance (ie sample size for YES do not equal to sample size of NO).

I tried doing a t-test but it does not make sense that the dependent variable is dichotomous. Is there any method that can be used to solve this issue?

Thanks in advance and hope to hear from you soon

This depends on what you are trying to test. In any case, what you can try is using a t test of sample 1 vs. sample 2. Here sample 1 = the Likert scores that have dependent variable value YES and sample 2 = the Likert scores that have dependent value NO. You can also do a correlation test (the results should be the same as explained on the webpage http://www.real-statistics.com/correlation/dichotomous-variables-t-test/.

Charles

Hi Charles

Real Statistics Using Excel is a very informative, clear, easy to follow and accurate description and analysis of various statistical techniques. It is useful for personal use and classroom teaching. Thank you for setting it up.

In Monte Carlo Simulation – I like to extend it using Markov Chains. Do you have any materails on this using Excel?

Does the Bin and Histogram are accurate in Excel because I have noticesd that it doe snot always count it correctly.

Thanks and Regards

Farhad

Farhad,

I am very pleased that you find the website useful. It is good to hear this.

I plan to add Markov Chains to the website. What specific things would you like me to cover?

I haven’t noticed any inaccuracies with the bins/histogram capabilities built into Excel. Can you give me an example?

Charles

Let me congratulate you for the work done! Amazing website, absolutely helpful and clearly explained. I’ve recommended already to many colleagues and I will keep doing this. Thank you for also make the code open to everybody.

G’day Charles,

I like the look of this package and hope that it will be very helpful, and I thank you for your efforts to date.

One question I have is about which Anova to use. I have data with 6 pigs on 5 diets, and 7 different blood measures, which are dependent to the pig from which they came.

I believe I should use the one-way anova (only one factor – diet) with repeated measures (7 blood measurements all taken at the same time). Can you confirm this?

Also, do the factor inputs need to use numbers (1 to 5 for the 5 diets) or can they use categories (AS, BS, CS, DS and ES)?

Many thanks in advance.

Johnno,

One factor is the Diet (5 levels), presumably the diets are independent of each other since they are for different pigs

Another factor is Blood Measure (7 levels), these are not independent of each other since they relate to the same pig

This means that you have a two factor model with one repeated measures factor

The factor names don’t have to be numbers; they can be alphanumeric. The data, which go in the cells, (which are probably blood measurements) need to be numeric.

Charles

G’day Charles.

Thanks for the reply.

I think I may not have explained it correctly. The only treatment being applied are the diets, being 1 control diet and 4 treatment diets (called D1,D2,D3,D4,D5). At the end of the experiment a single vial of blood is taken and 7 measurements made, ie total iron, ESR, pH, transferrin, calcium, cholesterol and TG’s.

When I tried to run either single or two way on my data there is no way to input the blood results separate from the factor (which was coded as D1, D2 etc).

I have 8 columns labelled Diets, Fe, ESR, pH, Trans, Ca, Cho and TG. Am I setting up the data correctly?

Johnno,

Assuming that you have assigned the pigs randomly to the 5 diets, you have a Diet factor which is a fixed factor (not repeated measures). You can now run 7 separate one-factor ANOVA’s, one for each pf the 7 blood measurements made. This would not take into account any correlations between the blood measurements. Alternatively you can run one two-factor ANOVA where the diet factor is fixed and the blood measurement factor is repeated measures. Another alternative is to view the blood measurements as a dependent variable and use one factor MANOVA where the diet is the independent variable and blood measurements are the dependent variables.

The result would be different if all the pigs are getting all 5 diets (presumably at different points in time.

Charles

How could I perform Probit (say brine shrimp lethality assay) using the add-in? I am a complete novice.

I have not yet implemented Probit in the Real Statistics Resource Pack. Logistic Regression (Logit) can be done using the add-in. For most analyses this is an alternative technique, and is more commonly used.

Charles

How then Sir, could I use Logit in lieu of Probit using the add-in?

See the webpage http://www.real-statistics.com/logistic-regression/, especially the options for Finding Logistic Regression Coefficients using Newton’s Method, which leads to the following webpage:

http://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/

Charles

Hello Charles,

Thank you for making this program available. I just installed in on my MacBook Pro running Yosemite. While I was able to install the program I get a run time 424 error message when I try and use the t-test function. The other statistical functions seem to work fine. Any suggestions?

Thank you.

Wayne

Wayne,

Unfortunately, I don’t own a Mac and so I need to borrow one to support the software. I plan to do so on Friday at which time I will try to identify the problem. I also plan to bring out a new release for Mac fairly soon.

Charles

Hi Charles,

Good morning! You have not mentioned any methodology for doing forecasting, please add forecasting method too. It will really help many of us to go further analysis in time series and so on. while doing forecasting in excel it takes count of the variable like

Year: 1990 1991 1992 1993 1994 1995

Sales: 500 550 501 502 510 500

It taking the count of 1996 for sales, is it right?

Please help me on this, am really confused.

Jerom,

The Real Statistics website and software address multiple linear regression (as well as logistic regression) and how to use these for forecasting. They also take the first step in addressing forecasting for timeseries by addressing autocorrelation. See the webpages

Linear Regression

Multiple Regression

Autocorrelation

I plan to include more materials on time-series in a month or two.

Charles

Hello Charles,

Happy New Year! I am very thankful for your hard work for providing simple, detailed and very resourceful information on your site. I have bookmarked your website. Thanks so much!

Jenkins

Dear Charles,

Thanks for sharing this very useful post. It was indeed very helpful.

Why am i getting a p-value=0 whne I use the SWTEST(). Is it a truncation or roundoff error.

Please advise.

Thanks in advance.

Anu

Anu,

If you send me a spreadsheet with your data I will try to answer your question.

Charles

I found the “Real statistics” extension of Excel by chance, while searching in the web. For curiosity, I tested it’s Principal component analysis results from a dataset of mine already analyzed with a popular software.

The results are exactly the same, obtained almost instantly… and for free !

It seems to be an incredible programming job done by an amazing guy certainly (sorry for my bad english expression).

Bernard Quebec city

Bernard,

Good to hear that the PCA worked as expected. I hope that you will continue to get value from the software in the future.

Charles

Dear Charles,

This is a great work. Thank you very much for these tools.

I want to perform trend analysis of climatic variables. I also want to do statistical downscaling of climate data. Please, let me know, how can I do that using excel.

Tapos,

Excel’s charting capability provides a feature called

Trendline. This can be used for trend analysis. My website describes some of these. You can also consult http://www.wikihow.com/Do-Trend-Analysis-in-Excel.There are also Exponential Smoothing and Moving Average data analysis tools in Excel.

All the regression capabilities described in the website can be used as well.

Charles

Hello Mr. Charles,

I must say I’m impressed by this great work. I have a problem with the software, I’ve downloaded it but each time I press the ctrl-m keys I’m asked to enter a password. Could you please help me.

Hello Emmanuel,

It sounds like the Real Statistics Resource Pack was not installed properly. To find out for sure, enter the formula =VER() in any cell in an Excel worksheet. You should see the version number of the Real Statistics Resource Pack that you downloaded (e.g. 2.16.2). If you don’t see this, then the software is not recognized as an Excel add-in (which is what the installation instructions are intended to do). See the webpage http://www.real-statistics.com/free-download/real-statistics-resource-pack/ for more details. The key step is described in Figure 2 on that webpage.

Be assured that you don’t need a password to use the software. You just need to ensure that Excel recognizes the Real Statistics Resource Pack software as an “add-in”. This is what the installation instructions are intended to do.

Charles

Hola! I’ve been following your weblog for some time now and finally

got the bravery to go ahead and give you a shout out from

Dallas Tx! Just wanted to say keep up the great job!

Hi,

No question. Just wanted to thank you for the incredible effort you’ve put into this site and the excel addin. Keep up the great work.

seen it to very useful in social research data analysis.

seen it to very useful in assisting my students during e-learning-teaching programmes.

Hi Charles,

I generally don’t comment on posts, but I thought I should this time, to tell that the work is simply wow!!

Keep it up!

Muzz

Hey,

I want to perform the independent t-test on over 200 pairs of ranges. It is impossible for me to go through the traditional method. So I couldnt get around a way to use this Add-in the macro.

Can you help me out with the same?

Thanks

Sure. You can use the Excel function

T.TEST(orTTESTfor versions of Excel prior to Excel 2010) or the Excel data analysis toolt-Test: Two-Sample Assuming Equal Variancesort-Test: Two-Sample Assuming Equal Variances. You can get more information about these via the webpage http://www.real-statistics.com/students-t-distribution/.Charles

Chirag,

One more thing. You can call the Real Statistics functions from within a macro that you write using the approach described on the webpage http://www.real-statistics.com/excel-capabilities/calling-real-statistics-functions-in-vba/.

Charles

Hello Sir,

This is great website! Learning lot out of it! Definitely one of the best websites for learning I’ve come across! Thanking you so much for sharing your knowledge here! Those tools are great piece of hard work and dedication! Thank you! 🙂 And ya looking forward for neural networks Add-Ins 😉

Ankit Bhatt

Hello Ankit,

I appreciate your comments and I am very happy that you are learning a lot from the website.

I plan to add neural networks capabilities, but I have a few other capabilities that I plan to add first. Stay tuned.

Charles

Very nice! Thank you, I have been looking for something like this ever since migrating to Mac from PC.

I noticed that I am unable to select the data cells with a cursor. But typing them in manually works. Is this a known issue?

Thanks!

Paul,

I am pleased that you like the site.

I was unable to get the approach that I used for selecting a range in the Windows versions of Excel to work in the Mac version of Excel. This seems to be related to something called RefEdit.

Charles

Sir

Could you please teach us “noncentral chi-square” and “noncentral F” distribution?

Colin

Colin,

What is your motivation for learning about these distributions? Are you interested in power?

Charles

Hi Colin,

I have just added new webpages on the noncentral t distribution, noncentral chi-square distribution and noncentral F distribution. In addition these distributions are used to calculate power and sample size requirements. There are also new functions in the Real Statistics Resource Pack that support these new capabilities.

Charles

Sir

When will your statistics & excel book be published? Can we get that book before July?

Colin

Colin,

I have a project to finish up by the end of May. I will then turn my attention back to the book. I hope to finish by July, but it depends on what else comes up.

Charles

Sir

Thank you for your reply and I am looking forward to your book. I leaned a lot of new things about statistics and Excel from your website. BTW could you please teach us something about discriminant function?

Colin

Colin,

I will eventually support discriminant function analysis, but I don’t plan to provide it in the next release.

Charles

Dear Charles:

I have been using Real Statistics since earlier versions were launched (is a very powerful statistical tool), but I have had a problem since then (that has not been solved in release 2.12): when I create an Excel file in a computer that uses Real Statistics formulas (for example, RANKSUM), and the file “RealStats.xlam” is installed in a certain Windows location (for example: “C:\Users\UserA-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam), if I open the Excel file in another computer, where the file “RealStats.xlam” is installed in another Windows location (for example: “C:\Users\UserB-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam), Excel ask for an update of the RealStats location. Although it is not a very huge problem, is not confortable to update the RealStats route when the Excel file is used between two (or more) computers. Is it possible to create an Add-in where the function calls be independent of the location of the RealStats file?

Another little problem is: I use “Ctrl-m” to access Real Statistics. It’s not possible to link an icon in the Quick Access Toolbar or in the Ribbon, as reported by some Real Statistics users (in spite of following the steps you suggest in your blog, January 28th 2014). As I suggested in a prior message (November 2nd, 2013), perhaps is better to use the Custom UI Editor Tool with the purpose of solving that trouble:

http://msdn.microsoft.com/en-us/library/ee691832(v=office.14).aspx

http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2009/08/06/7293.aspx

Thank you.

William Agurto

Dear William,

Thank you for your question. You point out a couple of difficulties that others are having as well (including me when I use the software).

To address the first issue (as I understand it) is quite straightforward. Simply select

Home > Editing|Find & Selectand choose the Replace option. Then insert “C:\Users\UserA-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam in theFind Whatfield and leave theReplace Withfield blank. Click on theOptionsbutton and make sure thatWithinis set toBookand notSheetand then finally click on theReplace Allbutton. I could try to create a macro for this, but the problem is that the path used for the Real Statistics add-in file is dependent on each person’s computer and where they locate the add-in, and so it may not be so easy to implement. In any case the above approach is pretty easy to use.The second is a longstanding problem. Since Ctrl-m works for everyone I haven’t yet allocated the time and energy to solving it and instead have continued to put the time and energy into updating the statistical capabilities. I have looked into the approach that you suggested, but it doesn’t seem so easy to implement (at least for me). I will eventually get around to fixing this problem, but my apologies for not doing it yet.

Charles

Hi guys, I don’t even know where to start, but without any doubt I can say that your website is outstanding. With so much crap all around, finding you was a bless. I was looking for eigenvalue and eigenvector in excel to use in econometrics applications and I found you. Thanks a lot for all the effort that you put in the examples, explanations and other things. I usually use stata, eviews, minitab and shazam for statistics and excel only for quick things, but now I will use it more often. I will visit you once a week for now on.

Dear Charles

thanks for a very useful set of techniques.

Happen to develop more complex analyses, like General Linear mODel, or Mixed Linear Models?

Best, Alessio

Dear Alessio,

I plan to add these techniques in future releases.

Charles

Thank you so much for these tools!

If I may make one suggestion, it would be to have the output result default to a new worksheet, as it was not immediately intuitive to leave the field blank. Even a checkbox or something that simply cleared the field would be really helpful when I share these tools with others.

Also, have you, or would you be willing to look at developing tools related to the evaluation of risk or loss (i.e. loss triangles via chain-ladder methods, Benford’s law, etc….)?

Thanks again for the great toolset though, I find it so much easier and faster than other tools I used for quick analysis (R, mathmatica).

Thomas,

Thanks for your support and suggestions. I’ll look into the proposed changes and enhancements.

Charles

Thomas,

The latest release of the Real Statistics Resource Pack (Rel 2.11, issued today) enables you to output to a new worksheet by just pressing a button (see Blog for a description). Thanks for your suggestion. I plan to cover Benford’s law shortly. I’m still trying to decide what to do about the other topics you suggested.

Charles

Just curious as to if there has been any updates to this. Been a consistent user for about a year now and still love it!

Thomas,

There have been lots of updates to the software, website and examples file. See Blog for more details. I am very pleased that you have gotten value from the website for the past year.

Charles

Hi Charles,

Is there any chance that you could develop a neural network capability for excel?

that would be awesome!!

By the way thanks for your hard work. I really appreciate it.

Hi Hamed,

Thanks for your suggestion. I will add this to my list of possible upgrades. I developed a neural network capability in a different context many years ago.

Charles

Excellent site and exposition! Clear, concise, and exquisitely presented. This is a fine resource for statistical analysts using Excel.

HI,

I cannot find correlation in the ctrl m menu

Hi Brian,

After entering Ctrl-m choose the Matrix Operations option. When a dialog box appears, select the Correlation option.

You can also use the CORR(R1) supplemental function.

Charles

Hi Charles,

Happened to encounter an issue when using the add-in for Logistic regression. It only asks for input range – so should I presume that it will implicitly assume the first column data as the dependent variable and the rest of the columns as the explanatory ones. Also, I got an error when running the logistic regression citing “Type mismatch”. Any pointers to move beyond will be much helpful. As always, thanks in advance!

Rgds,

Arvind.

Arvind,

The Logistic Regression data analysis tool accepts two formats: raw data and summarised data. When the data is in raw format, the last column is for the dependent variable. It can only take the values 0 or 1. The other columns are for the independent variables. When the data is in summarised format, the last two columns are for summaries of the dependent variable. The last column specifies the total number of failures and the second to last column specifies the total number of successes. The other columns specify the values for the independent variables. See the Logistic Regression web pages in the website for more details.

The likely reason for the error message is that you used a value that was not 0 or 1 in the last column when using the raw data format. I have changed the software recently to detect this problem and so hopefully you would get a warning when filling in the dialog box rather than an error message during execution.

Charles

Hi Charles,

Really appreciate your effort in developing and upgrading (constantly!) the resource pack – I’m a huge fan of it! I particularly like the ease of use and sincerely thank you for your efforts.

Just as a side note, I thought if there can be an option to put the outputs in a separate worksheet (if not a workbook), it would be even more awesome. I’m not too sure of the complexity involved in adding this feature but just thought of sharing it with you.

Nevertheless, the add-in in its current form is itself wonderful.

Cheers,

Arvind.

Hi Arvind,

Thank you very much for your comments. You can already display output from a Real Statistics data analysis tool on a separate worksheet. Just leave the Output Range empty (by erasing the default range). This will display the output on a new worksheet.

Charles

Indeed had missed out on using that feature of getting the outputs in a separate worksheet by blanking out the output range. Appreciate the quick response Charles!

Rgds,

Arvind.

This website has been such an excellent resource in terms of building a statistical model. I don’t think I can thank you enough for the amount of detail, concision and information you have so graciously taken copious amounts of time and effort to compile.

I am looking for some basic SPC style charts X bar & R, Run, etc. I also want to plot batch data in a histogram with the statistical distribution super imposed over it. The data is typically normal. Can your software help me with that?

Richard,

The software does not do this at present. I will look into this for a possible future release.

Charles

Dear Charles,

thank you so much for putting so much effort into this website and – most of all – for making it available for download together with your really thorough explanations.

I implemented a few of your examples and they work perfectly, but I am having a hard time figuring out how to run the “One between subjects factor and one within subjects factor” example with unequal group sizes for the between factor (all groups are of size 7). I can see that the number of subjects for each level of the within factor would always be the same (since it’s repeated measures), but for the between factor, it is a very common case that we have more participants in one group than in another. From my textbook knowledge of ANOVAs, I know there should be a straightforward way to deal with this. I hope you can help me with this.

Thank you in advance.

Best,

Tim

Tim,

I am in the process of rolling out multivariate statistics capabilities. Once I finish with this later this month I will look into supporting unbalanced ANOVA with repeated measures. I agree that it would be useful.

Charles

Thanks for the quick reply. Looking forward to hearing from you 🙂

Tim

I just want to thank you for your effort to publish all this intelligence here.

Though I am using OpenOffice instead of Excel I found your explanations completely sufficient to build my own tables for the task I had to do (Mann-Whitney test)

I am sure to come back anytime I have to use this very helpful resource.

Thanks.

Norbert

Dear Dr. Aust,

Thank you very much for your support. I appreciate that you find the site a helpful resource. I am continually updating the site and hope that you find the new information helpful as well.

Charles

Hi Charles,

Does the Real Statistics resource pack work for Macintosh versions of Excel?

Thanks for your response,

Tim

Sorry Tim, but I have not yet created a version for the Mac version of Excel. I plan to create a Mac version, but I don’t yet a definitive release date established.

Charles

Wow! I think this is the best resource for statistics I have ever seen! The only issue I have had is using the function WTEST – it doesn’t work. Any thoughts?

Hi Teresa,

Thank you very much for your very positive opinion of the website. I have tried to make the site a resource for learning how to do statistics and a free set of tools to perform useful statistical analysis.

I have just retested the WTEST function and found that it works, at least on my computer. Can you send me the example that didn’t work? Please note that WTEST takes two arguments (while the function WTEST1 only takes one argument).

Charles

What great work in integrating educational information and the means to effect solutions using software I already own and use with free supplemental features.

Was browsing thru the site not too long ago and thought I saw mention of upcoming version release of the resource pack and/or mention of a new pack for multivariate study. Couldn’t find these posts or references today, nor any citation as to the date of the downloads to see if there is a newer version available. Are there upcoming releases any of the downloads planned for the near-term; or, what are the dates of the current downloads?

Thanks

Rich,

I have put a new release up earlier today (release 1.3), which provides some bug fixes (especially fixing the QQ Plot data tool). I expect to put another release out in a couple of weeks which provides improved versions of some of the existing functions and tools. I am testing it now. As far as the multivariate release (release 2.0), I hope to have this out before the end of the summer.

Charles

Hi,

thanks so much for putting effort in providing a free downloadable collection of important statistical functions in Excel. While I do use R for my own needs, collaborators more familiar with Excel will surely benefit from this.

However, I came across these AddIn because I was looking for functions performing Shapiro and Levene Tests and wonder why the results of your version (regarding the P-values) is quite different from other implementations.

I am not sure if this Reply-section is the best way to ask you about this but didn’t find another contact possibility. Below you will find an example.

All the best [jan]

Example:

2 Groups containing the following values

g1 = c(0.925,0.925,0.935,0.95,0.95)

g2 = c(0.69,0.71,0.85,0.89,0.955)

LEVENE() will result in a P-value of 0.0026

while R using the function: car::leveneTest(y=lm(c(0.925,0.925,0.935,0.95,0.95,0.69,0.71,0.85,0.89,0.955)~gl(2,5)))$Pr[1])

and this implementation:

http://www.stat.ufl.edu/~winner/computing/excel/levene.xls

both result in P=0.0313

Hi Jan,

Sorry for the delay in responding to your comment. The Levene test is based on the absolute values of the residuals. There are three versions of Levene’s test:

• Residuals based on the mean (as in the LEVENE function provided in the Real Statistics Resource Pack)

• Residuals based on the median (where “mean” is replaced by “median” in the calculation)

• Residuals based on the 10% trimmed mean (where “mean” is replaced by “10% trimmed mean” in the calculation)

LEVENE() uses the “mean” version and the other tools you mention use the “median” version. You can find an example of how to calculate the “median” version of the Levene’s test in the Workbook Examples, which contains all the examples described in the website plus some additional examples. You can download these examples by going to the page

http://www.real-statistics.com/free-download/real-statistics-examples-workbook/

and following the instructions for downloading the Examples Workbook and linking it to the Real Statistics Resource Pack.

In the next release of the Resource Pack I will add functions which calculate the other two versions of the Levene’s test.

Also please see the Homogeneity of Variance page to see where the mean version of the test is the better choice and where the median version is the better choice.

Charles

Update(20 July 2013): The new release of the Real Statistics Resource Pack (R1.4) modifies the LEVENE function so that you can specify which of the above three types of Levene tests you want to perform.