**What is Real Statistics Using Excel?**

*Real Statistics Using Excel* is a practical guide for how to do statistical analysis in Excel plus free statistics software which extends Excel’s built-in statistical capabilities so that you can more easily perform a wide variety of statistical analyses in Excel.

**What does Real Statistics Using Excel consist of?**

*Real Statistics Using Excel* is comprised of the following three components:

**Real Statistics Resource Pack: **an Excel add-in which extends Excel’s standard statistics capabilities by providing you with advanced worksheet functions and data analysis tools so that you can more easily perform a wide variety of practical statistical analyses.

**Real Statistics Website **(i.e. this site):

- Lets you download a free copy of the Real Statistics Resource Pack
- Provides descriptions of how to perform a variety of statistical analyses using built-in Excel capabilities as well as supplemental capabilities provided by the Real Statistics Resource Pack
- Presents numerous examples in the form of Excel worksheets which you can download to your computer

For the student and the novice, the Real Statistics website is an excellent tutorial for learning the basic concepts of statistics and how to do statistical analysis. For all users, it provides a step-by-step guide for how to do statistical analysis in the Excel environment and the tools necessary to carry out these analyses.

**Real Statistics Examples Workbooks**: two Excel files which contain all the examples contained in the website. These files can be downloaded for free.

You can use this website to learn how to perform statistical analyses in Excel even without using the Real Statistics Resource Pack, but we recommend that you download the resource pack so that you can have access to its powerful capabilities.

**How do I get started?**

**Step 1**: If you elect to use the Real Statistics Resource Pack or would like a copy of the examples used throughout the website, click on the following icon and you will be given the opportunity to download and install for free the Real Statistics Resource Pack and/or the Examples Workbook.

Once you have downloaded and installed the Real Statistics Resource Pack, you will be able use the supplemental capabilities from the copy of Excel that you run on your computer as described throughout the rest of the website and summarized in Real Statistics Functions, Real Statistics Multivariate Functions and Real Statistics Data Analysis Tools.

If you choose not to download the resource pack or examples now, you can do so later at any time.

**Step 2**: Browse through the website to to learn how to perform a wide range of statistical analyses in Excel using standard built-in as well as supplemental capabilities. We suggest that you begin by clicking on the Website Introduction (and especially Organization of the Website) for further information about how to navigate the website to get the information you need to run any specific statistical test or learn about any particular topic.

**Why do statistical analysis in Excel?**

The reasons for choosing Excel are as follows:

- It is widely available and so many more people know how to use it
- It is not necessary to incur the cost of yet another tool
- It is not necessary to learn new methods of manipulating data and drawing graphs
- It already contains some basic statistics functions and data analysis tools
- It is much easier to see what is going on since unlike the more popular statistical analysis tools very little is hidden from the user
- It provides the user with a lot of control and flexibility

This makes Excel an ideal tool for learning statistical concepts and performing some basic statistical analyses, but unfortunately its built-in statistics capabilities are limited, and so it is often easier to use statistical tools such as SPSS or SAS for carrying out more advanced statistical analyses.

It is to address Excel’s shortcomings that we have created the Real Statistics Resource Pack. This software package contains various supplemental tools that enable you to carry out a wide range of advanced statistical analyses without leaving the Excel environment. You can download the Real Statistics Resource Pack free of charge from this website (as described above).

Hi,

Thank you very much for this helpful site. Well I have some questions my research is about factors affecting students’ pursuit of STEM, what do you think is the best treatment for the data where my DV are from Likert/summated scale and my IV is a nominal data (yes/no if they will or will not pursue STEM).

Is a point biserial ok? (My panel wants me to use correlation.)

or is logistic regression more appropriate?

Thank you in advance! 🙂 🙂

additional question:

is it ok that I only got 20 yes out of 300 respondents?

Sorry, but I don’t have a definite answer for you. Logistic regression may not converge to a solution with such unbalanced data.

Charles

Javin,

It really depends on what sort of questions you are trying to ask, but correlation and logistic regression both might be reasonable approaches. Note that for this sort of data point biserial correlation is equivalent to correlation.

Charles

Thank you 🙂 🙂

Hello sir, my question is little different it was asked me by interviewer the question is, “say the statistical reason that why airplane fly 2000 feet upon the earth”

Sorry, but I don’t understand your question. Airplanes don’t fly 2000 feet above the earth, they fly much higher than that.

Charles

Hi Dr. Charlz

I am a PG student carrying out a study on Burden and Coping among care givers of stroke survivors. The question I am about to post may seem naive but its quite confusing for me

The objectives are

1. To assess the burden among caregivers of stroke survivors.

2. To identify the coping strategies among caregivers of stroke survivors.

3. To find the association between burden and coping strategies among caregivers of stroke survivors.

4. To find the association between burden and selected demographic variables among caregivers of stroke survivors.

5. To find the association between coping strategies and selected demographic variables among caregivers of stroke survivors.

Statistician recommended to use Chi square for Objective 4, ANOVA for Objective 5 and Independent t-test for Objective 3. I am doubtful on why ANOVA and t-test for the objectives 5 and 3 , I suggested using chi square.

The coping checklist is categorized as Problem focussed,Emotional focussed and Problem and emotional focussed.

Burden is categorized as No burden,Moderate Burden and Severe Burden

I assume your questions whether to use your approach or the statistician’s approach. In order to answer I would need to know the nature of the data that you are collecting.

Charles

Yes thats the doubt I have in mind

Regarding the data, i have 3 separate tools-2 standardised and 1 for sociodemographic variables

Section A: Socio personal data of the participant

Caregivers socio personal data consisted of 7 items including age, gender, education, marital status, employment status, monthly income, family type.

Section B: Clinical data sheet

Clinical variables consisted of 8 items including relationship with the patient, total caregiving time(hrs/day), duration of care, availability of sub caregivers, family history of stroke, level of dependency, type of disability and duration of illness.

Tool 2: Burden Assessment Scale(BAS) .Instrument consists of 40 items on burden with 3-point Likert scale Scaled not at all , to some extent, very much. The score range from 40-120 . Higher scores indicates greater burden

0-40 – no burden

41-80 – moderate burden

81-120 – severe burden

Tool 3: The Coping Checklist (CCL)

The coping checklist (CCL) has 70 items describing a broad range of behavioral, cognitive and emotional coping responses that an individual might use to handle stress. The responses are scored dichotomously in a yes/no format, indicating whether a coping behavior is present or not. The total no. of items reported by an individual is indicative of the size of coping repertoire. Further refinement of the tool resulted in seven subscales, one of problems focused coping, five of emotion focused coping and one social support subscale, which is a combination of both problems focused and emotion focused coping.

Based on what I have understood, the fact that the coping strategy is a score between 0 and 70 makes me think that ANOVA is the correct tool.

Charles

Hello sir,

Thanks for making this tool available.

Is there a way to work on SARIMA model?

What I see in the add-in is just ARIMA without the seasonal component.

Sorry Emmanuel, but I have not yet added a seasonal component to ARIMA. I do explain seasonality for ordinary linear regression and for ETS models. See the following:

Regression with Seasonality

Holt-Winters Forecasting

Charles

Dear Doctor Zaiontz,

I have two factors, i.e. gender (male/female) and generation (old/mid/ young). The result of two-way anova revealed no significance. Could I conduct planned t-tests? If yes, could I use “contrasts for two factors” on the website to do this?

Thank you for your time!

Rebecca,

Yes, you can use contrasts or Tukey’s HSD. See the following webpage:

http://www.real-statistics.com/two-way-anova/follow-up-analyses-for-two-factor-anova/

Charles

Dear Dr. Zaiontz,

I have one dependent variable (binary categorical) and 7 independent variables; 6 of them are binary categorical and the seventh independent variable is age. I want to use correlation to determine if the independent variables affect the dependent variable (falls). Can I use multiple correlation for this? My adviser wants me to use correlation.

Thank you,

Azuka

Azuka,

Yes, you can use the multiple correlation for this. See the following webpage:

Multiple Correlation – Advanced.

Charles

Hello, I would like to know how to solve this problem.

In a study of 150 male and 200 female internet users, 30% of males and 35% of females said they plan to shop online at least once a month. At 0.10 significant level, is there a difference in proportion of internet user between male and female internet user who plan to shop on line?

Banz,

You could try to use a chi-square independence test, knowing that 45 males shop online and 105 don’t, while 70 females shop online and 130 don’t.

Charles

Dear charles,

Sir i need a Help from you. I am basically a Statistics student.

now presently I am working in a Shrimp feed(fish feed) production company.

So this feed is produced in 5 types generally based on a formula. now we have launched a new formula with a minor change. We have fed this new feeds and previous feeds in different ponds independently. It means the shrimp in 10 ponds are fed with 10 feeds. The growth of shrimp is noted every week and the data for 6 weeks of these growth rates of 10 ponds are collected. Now my task is to analyse and report weather the previous formula of feed is better or the new formula. Sir please help me out how to perform this statistical task in excel.

It sounds like you need to use some form of ANOVA (repeated measures, split-plot, etc.), but it is difficult for me to determine which type of ANOVA model is best based on the limited information that you have supplied.

I suggest that you look at the ANOVA portion of the website to help you determine which is the best form to use.

Charles

Hi Charles,

I have run a few tests using real stats and they worked perfectly. Thankyou! I however keep getting an error when I try to run a repeated measures ANOVA. It says ‘Run-time error ‘1004’ Application defined or object defined error.

Do you know why this might be the case?

Many thanks,

Na

Na,

If you send me an Excel file with your data and analysis I will try to figure out what is going wrong.

You can find my email address at Contact Us.

Charles

Hello Charles, I don’t want to look a gift horse in the mouth, but just wanted to understand your rationale for providing this fantastic resource for free. Apologies if this is written somewhere on the site but I couldn’t find it. Many thanks, I find the articles a very good, clear, and practical introduction to the topics (which tend to be covered, unnecessarily, in a much more elaborate and confusing way elsewhere!).

Jim,

I appreciate your kind remarks about the site. I am always pleased to learn whether I have been successful in providing useful but simple-to-use information with the right combination of utility and explanation.

This might sound corny, but my goal is to give something back to the world. I started out in my career as a research mathematician and professor, but have worked as an executive in business for most of my career. Now that I am semi-retired I wanted to return to mathematics. I am having fun putting together the website and the software resources, and at the same time I feel that I am doing something to help people all around the world. I am also enjoying the challenge of making the website as good as I know how to make it.

My goal is not to make money, but shortly I will begin to publish a series of ebooks (as mentioned on the website) to help defray some of the costs of the site. I also plan to add a donation button for the same reason, but even after almost four years of creating the website, I still haven’t gotten around to adding this.

Charles

The add-in seems to have a problem with Independent variables less than 1 and greater than 0. Why is this?

Chris,

If you are referring to values that you put in the dialog boxes (e.g. for Alpha), then this is due to how decimals are handled in some versions of Excel (comma versus period). I have added a % option with does away with the problem. If you click on the Config button on the main dialog box (the one that appear when you press Ctrl-m) and choose this option, then instead of entering .05 (or ,05) for Alpha, you would enter 5 for Alpha %, thereby eliminating the problem.

If you are referring to some other problem, then please send me an Excel file with the data that you are using. You can send it to the email address in Contact Us.

Charles

Checking the % in the Alpha caused a different issue. The tool is looking for a Classification Cutoff between 0 and 1, inclusive.

My question was about my raw data and if the tool had issues with independent variable values between 0 and 1. I was able to answer my own question when I tested just 2 of my independent variables that have values between 0 and 1. The tool was able to perform the calculations as expected.

However…There seems to be a limit on the number of independent variable values the tool can use to perform a Binary Logistic regression. I have 14 independent variables and after some testing, the tool “chokes” when you set the range to 6 or more independent values. I will send you my data for your review. Thanks.

Hi Again,

Seems I’ve been able to move on past 6 independent variables and now the tool can work all 14. I’m still working on the arrangement of the data in the sheet so I can be confident when the Regression returns results, they are as accurate as possible.

Is there any way to dictate the sorting of the data that is returned in output range? By default, it appears to sort the data in ascending order by the first two variables in my dataset. I would like the data returned in the output range in the same order as it is in the input range.

Chris,

I don’t fully understand your question. If you would like the data returned in the output range in the same order as it is in the input range, then you don’t need to sort at all. Clearly I am missing something.

Also which sorting approach are you referring to (since there are Excel procedures for sorting and various Real Statistics functions).

Charles

______________________

[1] – The Earth is Round (p>.05), Dec. 1994.

If a person . . .

J. Cohen [1] was lead to “prove” that the NHST procedure is wrong do choose the sequence:

__1) If a person is an American (A), then he is probably Member of the Congress, (MC)

__2) The person is Member of the Congress

__3) Therefore he is probably not American.

It is astonishing that an element of a subset (MC) be questioned to belong to its proper set (A), when a MC must be from A, necessarily, without exception.

!!! Do intend the Psychologists took us by idiots stoned? Or they are only kidding?

Luis

Hi! Many thanks for this cool site!

I’ve a little problem: My Excel is 2016.

When ever I activate the real stats add-in, save the file an re-open it I cannot see the add-in tab though the Add-In is shown up as activated. I have once again to deactivate the check box for Real Stats, leave the add-in check box list, go back and check it again. Then the add-In tab is shown up whare I find the real stats menu – but only for the actual session. When I save the excel file and re-open it again the add-In tab has vanished.

Any ideas?

Gabor Törö

Gabor,

This is a problem that has recently appeared due to some security upgrade that Microsoft just made. It affects any Excel addin including Real Statistics. Fortunately one of the users has identified a solution, which you can find at

Blog

Charles

Thank you!

Just tried it. It does not work for me. Well, it is only a bit annoying. Real Stats works. So forget it. Thank you.

Gabor Törö

Hi,

thank you very much for these powerful tools!

Is there a way to put these function into a Macro in a way to be able to repeat the function in different sheets, without have to do it every time in each sheet?

I tried with a normal macro in Excel 2016 and it doesn’t work and I didn’t find info in this website.

Thank you in advance for the answer!

Have a nice day!

Asnicar,

How to do this is explained on the following webpage>

Calling Real Statistics Functions in VBA

Charles

Hi

This really is a fantastic resource.

I have produced an employee engagement survey using Likert scales.

I have averaged a group of Likert responses to form an overall engagement score.

I have different engagement scores for different population groups eg from staff at different offices, staff with different job titles etc. I want to know if these differences are significantly different from each other.

I only have a basic understanding of statistics so if you’re able to respond as you would to a newbie it would be much appreciated.

Thanks and much appreciated.

Richard,

It really depends on the details, but you might want to start by looking at one-way ANOVA. See

One-way ANOVA

Charles

Apreciado Dr. Charles, me gustaria saber si hay una nueva version del paquete Realeal Statistics, dado que ultimamente he tenido que instalarlo por add ins cada vez que voy a usarle en excel.

Muchas gracias

Dear Dr. Charles, I wonder if there’s a new version of the package Real Statistics, since lately I had to install it add ins every time I’m going to use this one in Excel.

Thank you very much

Gerardo,

I expect to issue a new version sometime before the end of July.

Charles

Thank very much Sr.

Hi, Im a professor at Del Valle University in Guatemala City. I want my students to use Real Statistics in Statistics 101, which is a course that I teach. I have never had a problem installing the software however some of my students have been having some problems installing it. How can I contact you guys for further assistance? By the way, your program rocks! Thank you for doing it.

Gabriella,

Thanks for your kind remarks. You can contact us by going to Contact Us.

Charles

Hi Charles and Gabriella,

Microsoft has installed lots of Microsoft Office patches on 15th July. After that, I have run into the same problem. I can’t get Real-Statistics up and running through any of the methods. Charles, if you happen to find a solution, please do post it 🙂

Great tool! Hope to be able to get it back up and running again.

Best,

Wolfgang

Wolfgang,

I have had no problems running the Real Statistics software on my computer. What sort of error messages are you seeing?

Charles

Hi Charles, I have figured out the solution. All I had to do is go to Developer-AddIns, click on RealStats and ok. Excel asked me if I want to reinstall/overwrite, confirm, and it’s working again. Just in case anyone else encounters the same problem.

Let me just say that your tool is absolutely great; I do ALL my data analysis with it (well, with the exception of SEM).

Best, Wolfgang

Wolfgang,

Thanks for sharing this.

Glad to see that the software is useful for you. Eventually I will also add SEM support.

Charles

Hello I’m trying help my son who’s a pre-med student doing a summer research internship at a local hospital. We’re struggling to figure out which statistical test(s) are best for the given data collected. Is there some way to post a question regarding his data, to maybe get some suggestions from Statistic-minded people? I realize this is asking a lot, but figured I’d at least throw it out there, so maybe someone could make a useful suggestion for us. Btw we’ve been experimenting with the tools provided here and everything works great! It’s just that again, we’re not sure we’re asking the right question for the various stat’s tests. If that makes any sense.

Thank you!

Scott,

Please describe the hypothesis that you are trying to test and the nature of the data.

Charles

Apreciado Dr. Zaionits, reciba un cordial saludo, Dr excuseme una pregunta, si yo hago una tranformación de Box Cox Normal para mis datos, como calculo la inversa de la tranformación para interpretarlos?

Dear Dr. Zaionits, Yours sincerely, Dr Excuse me a question, if I make a transformation of Box Cox Normal for my data, as I do calculating the inverse of the transformation in order to interpret?

Great Work……..We are thankful to you.

NICE Charles!

Thank you for all the recent functions in the updates. It saves me a lot of time to perform forecasting.

Every time i download an updated version I am surprised by all the features in the AddIn. Probably the most comprehensive Excel AddIn for stats I’ve ever come across.

Y’all have a good one!

Marc,

Thanks for your comment. I am pleased that the software is saving you time and helping you with your activities.

I add more features all the time, although the output may slow a bit during the summer.

Charles

Does your product work with Excel 365? (windows)

Alex,

My understanding is that it works on the desktop version of Excel 365.

Charles

I’ve been doing scientific research for quite some time and some of the high level statistics used to always impede things (using python or matlab) and take away the fun of the general picture.

I have to say this is the best, most intuitive, and straight forward statistics analysis pack I know. Your explanations are also very good, not too over my head, but also not too dumbed down.

Thank you and keep up the good work!

Dear Charles

I can only echo the comments thanking you for this resource

Your clarity and examples have revealed the mystery of statistics to a less than numerate student ‘of a certain age’

For me the best advantage over ‘black box’ stats packages is that you demonstrate very clearly the relationship between the data and the results and the processes which generate those results

Kindest of regards

Rod

Rod,

Thank you very much for your kind remarks. I am trying hard to make the subject accessible to a wide audience.

Charles

Hi, can someone help me? I am trying to use the Jenks Natural Breaks option. It kept freezing. So, I reinstalled, but now I can’t find the Multivariate Analysis option on the menu. It worked before beautifully, but now it seems to be gone. Any help is much appreciated. Tom

Thomas,

The Multivariate Analysis option is still supported by the software. Which version of the software are you using? You can enter the formula =VER() to find out.

Charles

Thank you, Charles. I’ve reinstalled. I’m using 2013, and I now see the option. But, it seems to freeze when I try to use Jenks. Is it supposed to process for a long time? Any advice is most appreciated.

Tom

Tom,

If you leave the Number of Iterations field blank then the algorithm will evaluate all possible partitions, which depending on the size of your data could take a long time. You can specify the number of iterations to reduce the processing time. See the following webpage for more infor:

Jenks Natural Breaks

Charles

Yes, I started to think that. I only would like 5 classes for 253 rows of two variables. The instructions seem to imply leaving it blank. It’s been processing for about 15 minutes now. Is there a best practice? Maybe 100K, how long would that wait? Should I just wait for this to finish?

Thomas,

15 minutes is a long time. Leaving the field blank means that you will get the best answer, while if you put in a value you might not get the best value, but you may get a good-enough value. In any case, I would suggest that you put in some low value, say 1,000, and see how long it takes. If not too long increase the number until it looks like it is taking too long.

Charles

Hi Charles,

I stumbled upon this site: http://www.statisticsmentor.com/ … i wonder if this is also yours.

Best,

Kris

Kristina, no it is not my site.

Charles

Thanks for your attribution!

I am using it as Add-in for Excel. But in “k-means Clustering Analysis”, I get different results with the same data set after 10 times repeat.

Can you tell me what the reason is?

I’m sorry to bother you.

The problem was solved. There was some trouble with my computer. ^^

Hello Charles,

I just came across your powerful Excel site and I can say that I have been richly blessed. I am an online student of one of the US universities studying quantitative finance.

Thank you for your ingenuity in making this freely available.

Hello Charles,

Thanks for putting this online, but I’m already stuck in the installation process (maybe because I’m on the corporate network). I’ve copied the .xlam (for Office 2007) to my “C:\Users\username\AppData\Roaming\Microsoft\Invoegtoepassingen\” (Invoegtoepassingen=AddIns in Dutch), made sure the Solver add in is loaded but it tells me it has no access to the file.

I made sure it’s there (by browsing to check if its really there and not deleted or what). All rights seem okay (SYSTEM and my user have full control except “Special permissions”) and also restarting Excel didn’t help. Suggestions? Thanks!

Ben,

I don’t know why you are getting this message. The following comment from a user who uses the name “Cyberpreneur” may be helpful

“to correct the password problem requires installing the Real Stat addin in proper manner. Copy the downloaded file to /Microsoft Office/Office12/Library/Analysis , then restart excel. Then install in normal way.”

I don’t know whether this will work, but it might.

Charles

Wonderful resource! Thank you for freely providing this add-in and all your tutorials.

Many thanks for providing this great addition to Excel!

Sir,I have installed it but what is the password ?

You don’t need a password, but you do need to follow the instructions for how to install the resource pack. These instructions are listed on the same webpage from which you downloaded the resource pack.

Charles

Thanks for this program! I have upgraded from Excel 2010 to Excel 2016. I have updated the Real States to ver 4.6 in the same location as the older version (C:\Users\user-name\AppData\Roaming\Microsoft\AddIns).

I use realstats for KS 2-sample tests and really like it!

Realstats is asking for a password, and excel is throwing a “solver error” . Then excel crashes. Any ideas about what might be going on? Thanks.

Dave,

Glad to see that you are using the Real Statistics software.

The latest version of the software uses Excel’s Solver, and so you need to make sure that Excel’s Solver is installed. You do this by pressing Alt-TI (i.e. hold down the Alt key and press T and I) and making sure that there is a check mark next to Solver in the dialog box that appears. Then close Excel and reopen Excel.

Charles

Thank you for the prompt response.

I have solved the issue. You cannot check/install at the same time the Solver, RealStats, and Data Analysis AddIns. If you do, then there are problems. If you install the Solver, then make sure it’s working, then RealStats everything appears to run smooth.

Dave,

Thanks for the clarification. I will pass this on to others.

Charles

I am running El Capitan 10.11.3 on a Macbook Pro Retina. I have Office 2016. I installed this software and then deleted it, because it did not have some capabilities that the Windows version has. The problem is that the add-in still appears in my “available add-ins” in Excel, even though I deleted the program. Does this program have an uninstaller? How do I get rid of it so it does not show up under my “available add-ins.”? Should I reinstall it, then “uninstall” it as opposed to moving it to trash like before? Thank you for any help.

It probably doesn’t harm anything by doing nothing, but you should be able to “uninstall” it via the following steps:

1. From the Tools menu choose Add-Ins.

2. Uncheck the Realstats 2011 option on the dialog box that appears and click the OK button.

Charles

Hello,

i downloaded the booklet from syncfusion and i noticed an error on page 23, where it says:

“…such that (1+????)^4= 1.334. Thus, ???? is equal to .334(1/4) = .0747”

That is wrong. The correct procedure to find ???? is first to calculate (1.334)^(1/4) and then subtract 1: r=(1.334)^(1/4) – 1.

Also, .334(1/4) = 0.760, very far from 0.0747

Thanks Stefano for finding this error. I will try to get the publishers to correct the error.

Charles

I work in finance and regularly do statistical work, largely involving regression and correlation. In particular, I have used various tools, including importing datasets into R in order to get RSE regressions in order to satisfy regulators when doing loan loss analysis. What I don’t know is how I didn’t know of your site and tools earlier. Let me just say that you are my personal hero. Thank you so much for your articles and work. Do you have an area or way to accept contributions? I’d be happy to contribute something for this pack of awesomeness.

Thank you very much Mel for your kind words about the website.

My philosophy has been to provide all the statistical capabilities free of charge (except for some upcoming books which I will sell for some minimal fee). I want people to have access to these capabilities without having to pay. I am considering adding a voluntary donation capability to cover some of my costs, but so far I have focused on adding new capabilities to the site and software.

Charles

Really great plugin – was going to do some of the most basic stuff myself (e.g. Shapiro-Wilk and frequency reports), but this saved me loads of time and i’d never be able to do the advanced stuff you have.

ONE REQUEST….

Could you please make the “input range” used on most of your forms default to the current selection. This would be really great!

keep up the amazing work.

Graham,

Good to read that the plugin has helped you.

Please explain why making the input range default to the current selection would be helpful. Currently I have the output default to the current cell. What would you have me do when there are two input ranges?

Charles

Hi, here is my case, given purely with the intention for you to decide on the merits of my suggestion.

.

I’ll present two workflows to illustrate – my start position is my worksheet with the data (which I invariable have as a ListObject) and my selected cell is within my data (not an unreasonable assumption, as ive probably been data munging it).

Note:I’ve added in defaulting the output to a new worksheet, which I didnt mention previously, but which makes the most sense as a default under this scenario (microsoft defaults to that for inserting a pivot table for eg).

Scenario 1 – First input range defaults to selection

1) select a cell in my column of interest

2) Press ctrl-space to select all column data (or ctrl-shift-space for entire row)

3) select your test

4) select additional range if required (e.g, for correlation) – (manual or user range selection – but see optional enhancement suggestion)

5) Have default output to new worksheet, enter cell in current sheet (ie type 2 character, e.g L2,L4 etc)

6) Run test.

Optional enhancement – where a test requires two ranges, use range of selection.columns(1) and selection.columns(2) – these defaults would not mean any extra user clicks should they wish to override these as the default.

Advantages:

a) The user range browser (Application.inputbox(,,Type:=8)?) does not support ctrl-space shortcut – on large sheets you have to jump to start of worksheet to use the ctrl-shift-down cursor option to select entire col range (or row range).

b) If i’m typing in the range manually I don’t have to remember my data range if I cant see the start and end of it, or if I have chosen a new worksheet to paste my data

c) Defaulting to new worksheet is probably the preferred user option – guarentees do not overwrite data (appreciate you have the check which is v. good). Mixed data on my main data spreadsheet breaks data imports SPSS, SigmaPlot, SQL Server … etc etc … (Extra data leads to misdetection of datatypes etc)

e) For single vector test I do not have any additional key presses – click ok and its done. For 2 range tests, one range is already complete. (but see optional suggestion above)

Disadvantage

a) Unclear if majority prefer defaulting test output to current worksheet. If they do though, then all they have to do is type in a simple single cell reference.

Current scenario

1) Review my worksheet and make a note of my input range(s) [optional] – may involve scrolling and sheet navigation on large dataset.

2) Select a cell where I want the output (or click the new option)

3) select your test

4) manually enter first input range from (1) or use the user range selection tool.

5) manually enter second input range from (1) or use the user range selection tool.

6) Click new for new worksheet (optional)

7) Run test.

Hope you dont take this as a criticism, its merely well intentioned feedback.

all the best,

Graham

Under the scenario of doing different tests on the same range, defaulting the input box to current selection is better as well.

Graham,

Thank you very much for your suggestion and your description. I am thinking of a way to offer the user a choice between defaulting the input or the output. I still need to figure out what is the most intuitive approach.

For now, as long as you highlight the first row of the input range you can use the Fill button to highlight the rest of the range.

Charles

Hello,

I am conducting a survey with a sampling of about 800. The survey has four categories as dependent variables. A few of my hypotheses range from comparing gender (IV) to a particular category (dv). The other hypotheses compare education, player experience and coaching experience (IV) to the total of all categories.

I used t test paired samples with my pilot test with only 9 samples….but I was looking at the MANOVA and was wondering if this was a better data analysis. And if so, is MANOVA easy to do on your excel program?

Deanna,

If you have multiple dependent variables that are correlated with each other, then MANOVA may be a better choice.

The Real Statistics Resource Pack does support MANOVA and there is a MANOVA data analysis tool in the resource pack that should be relatively easy to use.

See MANOVA for more details.

Charles

Dear Charles,

Does your software package include the ability to apply statistical process control analysis to data, specifically as developed by Dr Walter Shewhart?

I ask because only Shewhart’s method adjusts sigma limits to allow for handling real time data as a basis for predicting if a system is stable or unstable with the limits calculated from the data collected. Dr Shewhart also worked very closely with Edward Deming to help him develop his method of continual quality improvement.

I mention this because I want to apply this spc software prospectively to make inductive decisions in real time not deductive decisions based on traditional statistical analysis.

Best wishes

Terry Fox

Terry,

I don’t support Shewhart’s statistical process control analysis, although I will add this to the list of possible future enhancements.

Charles

Dear Dr.

have one problem in use the CHI square test.

the valor alpha must be a number between 0 and .5

as I solve this?

best regards

Carlos Adriano

Carlos,

I don’t know you would want to use a value outside the range 0 to .5, but in this case, you can choose any valid number, say .05, run the data analysis tool and then change the value of alpha in the output and the output will then be revised to use whatever value of alpha that you used.

Charles

Found my crib notes—no problem !

JMB

Dear Charles,

I am trying ti use the Scheirer Ray Hare test. I succeed un testing with your exemple on the web site.

Nevertheless, in you exemple with 3 fertilizer, number of rows by sample is the same (5). How can i do when i have 3 samples with different numbers of subjetcs?

Thank you for your help!

François

François,

Caution: There is an error in the Scheirer Ray Hare test described on the website and from the data analysis tool. This will be corrected in the next release of the Real Statistics Resource Pack.

The Scheirer Ray Hare test requires that the group sizes be the same.

Charles

Has anyone tried this with Excel 2016 yet? I’m downloading Office 2016 now and concerned that RealStats won’t work… Charles, or anyone else, if you can share experiences or tips on using this with Excel 2016 I’d appreciate it.

Update: So far so good – RealStats opens fine, and initial tests seem to yield expected results. Will upate again if I find any problems.

Others have told me that Real Statistics works fine with Excel 2016.

Charles

I appreciate your comment

Dear Dr Zaiontz, Yours faithfully. Of like you, I’m using this spectacular statistical tool developed by you and your collaborators in order to dictate my statistics classes and Biostatistics, thus are increasing the number of followers to your page. Dr. respectfully I would like to know what the last version has the Real statistics, I have 4.2 version; if I already came a new and updates implemented.

Thank you

Dear Charles,

Thank you for your guidance.

I have a problem in sample size for running MANOVA. I have an independent variable and five dependent variables. I wonder whether 20 is enough. I really need your help.

Mina,

I don’t see how you can run MANOVA with only one independent variable.

Assuming that you have at least two independent variables, the answer depends on what sort of statistical power you need. This will depend on the expected effect size. In any case, 20 is a pretty low number to expect much power from the statistical test.

Charles

Dear Charles –

I’m trying to cite your website for a paper I have written. Would you mind providing a complete citation for your website? I want to be sure that I have specified it correctly.

Thank you,

Karen

Hi Karen,

You can get the recommended citation information on the webpage Citation.

Charles

Hello,

I downloaded the R-stat but cluster analysis is not showing in the add-in table. What seems to be the problem?

Thanks,

Liliya

Hello Liliya,

Choose the Multivariate option and you will see Cluster Analysis on the dialog box that appears.

Charles

Dear Charles,

Great website, thank you very much. Do you cover Data Envelopment Analysis in your website? Please send me the link if you do, because I can’t find it and it does not appear with a simple search over your website.

Thank you and regards,

DBM

DBM,

Sorry, but I don’t cover DEA at this time.

Charles

Check out http://deafrontier.net/. They have a free version of DEAFrontier, which is an Excel add-in to do DEA. Of course, the free version has limitations on DMUs, inputs, and outputs.

Dr Charles, good afternoon, please do not not work properly because Cohen’s kappa, which molestrale worth, I attached the file so please tell me I’m doing wrong, I use the tool, and sometimes it works and sometimes not.

Thank you very much, and again I apologize for the inconvenience.

Dr I offer apologies and I found what my error.

Thank you

GAD

Thanks for this toolpack, Charles.

I was excited to see Fleiss’ Kappa in the Reliability section. However, after setting up my data, I ran the test and got the error #N/A. I looked at the output and see that KAPPA is listed in the output cells as the array function (e.g. =KAPPA(B3:Q25,0,TRUE,B30,B31) ). Searching the function terms in Excel fails to identify a function called KAPPA. Is this the reason for the output failing? I am using Excel 2010 (windows based).

thanks for any help.

Mark,

Excel identifies KAPPA as a user-defined function.

You can also access Fleiss’ Kappa via Real Statistics’ Reliability data analysis tool (press Ctrl-m to access this option).

If you send me an Excel file with your data I will try to figure out why you are getting an error output. See Contact Us for the email address.

Charles

This is a superb website for newbies. So much invaluable information..Thank you !!

Is there an intention to add a time series section ?

Keep up the good work.

I intend to add time series later this year.

Charles

Hello!

Thank you for putting this package together. It is very useful!

I was wondering if this is also compatible with Excel 2016? And if so, are the download instructions different for Excel 2016?

Thank you!

Alyssa

Alyssa,

I have not tested the software with Excel 2016, but I expect that it will work using the same download instructions. I plan to do some testing shortly.

Charles

Hi Charles,

Thanks so much for your recent help. I’ve learned a lot from your website and your suggestions for me last couple of times.

Nonetheless, I am still stuck with my problem these days. I have nine groups of data each containing millions of numbers. Obviously it does not work for SPSS or excel to compare these nine groups of data simultaneously. So I chose another way: calculated means of these nine groups and compared these means. Could you please give any suggestion that compare these nine means to see whether they are significantly different?

Thank you so much, Charles.

Best,

Stacey

Stacey,

I don’t know of any test that can tell whether 9 means are significantly different from each other, without looking at the samples from where they came from.

Charles

Got it, thanks a lot!

Dear Dr Zaiontz, Yours faithfully.

Dr. is possible to conduct a canonical correlation analysis with Real Statistics? How could it?

Thank you

GAD

Not yet. I plan to implement this capability shortly.

Charles

Dr very kind thank you very much

Dear Dr Zaiontz, Yours faithfully.

Dr. respectfully I would like to know the possibility that Bland Altman diagrams is added in relability Real Statisctis module is that it is not difficult and can be made directly in Excel, but as I am a professor of Biostatistics and as rates of kappa, and of Dhalberg and Houston, they are also widely used these diagrams quality measurements. No additional programming in Real Statistics so that we can support in his excellent management and statistical support for us and researchers, which greatly facilitates the pedagogy using Real statisctics.

Also in the module Chi square the implemetation of Fischer’s exact test, odds ratio and relative risk.

Thank you

GAD

I already have Bland Altman in my list of suggested enhancements. I expect to add this capability in one of the next releases.

I will look into adding the odds ratio and relative risk to Fisher’s exact test.

Charles

Thank you

GAD

Hi Charles, great product. Is there a way to increase the font size of the text in the pop up box?

Hi Dan,

I haven’t found a way for the user to increase the font size. Are you using the Mac version of the software? I know this is a problem for Mac users. I plan to increase the font size for mac users in the next Mac release.

Charles

Respected sir,

Kindly help me to solve the below mentioned problem

Three factors: Irrigation levels (3), Fertilizers levels (3) and split applications of fertilizers (2)

Main plot: Irrigation (3)

Subplot: fertilizer (3) x split application of fertilizers (2)

= 3 x (3×2) replicated thrice (3)

Thanking you

You described the premise, but have not described the problem you are trying to solve. What is the hypothesis that you are trying to test?

Charles

Hello Charles,

For someone not very well versed in the word of statistics, your website has been a godsend! I was wondering if you’d be able to point me in the direction as to which test I ought to perform on my data as every time I think I’ve found something suitable I read something that suggests it will not be appropriate!

I conducted a listening test in which a participant was asked to adjust the levels of seven different stimuli such that it gives them the same perception of annoyance as a reference stimuli. I had 15 participants in total, all tested individually (not as a group).

I’m trying to find out how combined noise characteristics increase annoyance in comparison to individual noise characteristics (i.e does the annoyance from noise with a tonal and impulsive component combined equal the annoyance from the noise with a tonal component plus the annoyance from the noise with an impulsive component).

My problem is that I’m now dealing with results that have a non-normal distribution as well as unequal variance. I thought maybe the Kruskal-Wallis test would be a good starting point, but one of the assumptions is that groups have the same distribution, and since the standard deviation of my means differ that probably means that I can’t really use it?

Hello Chris,

Thanks for your kind words about the website. I really appreciate it.

With data that is non-normally distributed and with unequal variances, Welch’s Test is probably the way to go. See the webpage

Welch’s Test

Keep in mind though that ANOVA works pretty well even with non-normal data, provided the data is reasonably symmetric. Non-constant homogeneous variances is a bigger problem, but even here as long as the variances are not too unequal (see Levene’s test), ANOVA should work pretty well.

Charles

Hello,

Thank you very much for your efforts!

I hope you could help me out with my problem.

Null hypothesis : Rural average premium charged by an insurance company = Urban average premium charged by the same insurance company

I had t-test in my mind to check out for equality of means.

I used Real Stat to perform ‘Shapiro-Wilk Test’ and I infered that my data fails to adhere normality.

So now I will have to opt for a non-parametric test!

Will ‘Mann–Whitney U’ test be appropriate here?

Thank you for your time.

Yes, if the two samples are independent. If not, Wilcoxon’s Signed Ranks test is probably the right one.

Charles

if i have a rainfall below for example and i wants to find the rainfall trend to the flood area how do i go above it . And is it anova and linear regression i will apply and how

RAINFALL (mm)

YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC

1990 10.6 4.9 Trace 131.8 84.3 446.3 532.8 33.3 274.8 169.4 94.3 62

1991 31 20.4 111.8 264.2 212.4 389.7 358.1 45.2 269.7 84.7 2.7 0

1992 0 0 22.4 51.5 139.7 270.3 202.9 22.9 246.5 77.6 30.2 13.7

1993 0.7 27.1 28.1 87.5 171.3 280 91.4 26.5 96.6 127.2 108.6 19.4

1994 18.6 14.1 79.3 43.4 161.7 209.3 34.7 39.1 241 212.3 20.5 8

1995 0 64 63.7 97.9 146.8 343.1 108.6 139.3 149.9 197.9 48.3 4.7

1996 4.8 25.4 109.8 135.2 293.3 599.5 401.3 206.3 135.4 166.6 36.5 10.2

1997 1.2 0 170.2 227.5 348.4 454.6 145 31.7 221.5 183.7 65.3 14.3

1998 8.2 7.7 43.1 87.1 106.4 48.8 30.4 16.5 55.5 236 33.7 16.7

1999 29.1 113.9 364.5 92.3 250.7 334 109.1 161.2 183.1 141.8 5.9

2000 6.2 0.6 9 80.7 181.9 198.9 126.7 98.3 444.1 120.2 4.5 2.2

2001 46.8 14.1 74.7 216.5 111.8 271.2 176.8 31.3 174.2 154.6 70.6 78.3

2002 64.6 44 76.5 159.5 221.9 372.1 296 85.3 189.1 245.8 30.4 49.2

2003 74.8 16.7 17 244.1 208.6 383 101.7 34 160.5 139.9 152.5 0.6

2004 142.3 86.6 58.8 177.4 310.5 709.6 64.9 44.1 387.1 233.1 92.9 13.8

2005 trace 151.3 90 246.9 336.6 526.7 461.7 48.5 122.2 134.4 26.5 4.5

2006 35.2 13.7 51.6 84 283.7 352.6 95.3 43.9 404.9 15.5 51.8 0

2007 0 8.4 25.5 43.6 167.3 569.6 166.5 185.3 101.3 79.1 19.5 8.9

2008 4.6 61.1 48.9 86 156.4 165.1 47.7 2.8 123.4 335.9 4.7 0

2009 0.6 66.1 154.6 39.8 224.4 746.6 159.4 478.1 369.1 141 30.1 Trace

2010 50.8 40.4 45.6 171.6 233.2 444.1 149.6 161.7 240.3 134.3 65.7

2011 0 67 7.8 114.8 317.4 317.1 463.3 86.9 217.9 376.1 54.2 2.6

2012 1.1 61.2 28.7 161 221.1 476.7 250.2 10.4 184.5 229 71.8 8.8

This looks like a seasonal analysis problem, which you can view as a form of regression. I plan to address these sort of analyses in one of the next few releases. Also Excel 2016 has some new capabilities in this area.

Charles

Hello Dr. Zaiontz,

First, thank you for the user-friendly documentation for applying statistical methods in MS Excel. Often more theoretical explanations of statistical operations are less useful when results are needed quickly. Your Excel-based explanations are easy to implement.

I have used some of the Excel procedures from this site to test the normality of a set of groundwater level residuals from a regional groundwater model I’ve developed for the in the lower Apalachicola-Chattahoochee-Flint River Basin for the U.S. Geological Survey. What is proper way to cite your website in a report I am writing to document the model?

Thanks,

Elliott Jones

USGS Hydrologist

Hello Elliott,

I am very pleased to see that the site has been useful to you. You can cite the website as described on the webpage

Citation.

Charles

Hi, great site and great Resource Pack!

Any plans for a Resource Pack update to go deeper into Time Series statistics,

especially unit root tests for stationarity like the Augmented Dickey-Fuller Test?

Thanks!

Daniel

Daniel,

Yes. Starting with the next release, I am adding capabilities especially focused on econometric topics. I hope to get to time series analysis in the release after next.

Charles

I am trying to assess the performance of a procedure in identifying something abnormal in USA cities. I tested this procedure on six different cities with different positive and negative prevalence for each city. I’ve already finished the step of constructing the ROC curve and the nonparametric AUC for each of the six cities. I am now trying to consider that those six cities as a sample taken from all USA cities. Therefore, I need to express the AUC for the procedure in terms of its confidence level. In this case, I have six values for AUC (n=6) and would like to use 95% confidence.The question now is: should I use the conventional t-distribution (for n<30) to calculate the intervals of AUC, or the James Hanley method?

If I am using Hanley method, I need to calculate the standard error for each case of the six cities since the neg. and pos. is different, then I will end up with six different values for the standard error. If using the conventional t-distribution, one standard error value will be calculated, which is supposed to. Please help

Ali,

Sorry, but I am not familiar with James Hanley’s method. In any case, most likely each method would have some advantages and some shortcomings, and so you would have to choose based on what you are trying to demonstrate.

Charles

Hello – I am looking for a joint PDF estimator. I have sparse data for two joint events from which I can build a joint (or conditional) probability matrix. I would like to find the parametric or non-parametric multivariate distribution which fit the data the best. Can this be done using your toolkit?

Sorry, but the Real Statistics software doesn’t do this yet.

Charles

thank you for your help. you are fantastic

Dear Charles, I have a question about comaparing two samples of rainfall data (monthly maxima rainfall intensity for the periods about 10 years). Because I don’t have the required sample size for calculating IDF curves with current rainfall data, the plan is to join data from two measuring stations. I want to check if the data can be merged. The difference in means an variances between the samples are important in this case, so I plan to use t-test an F-test to verify matching of the data. I also plan to use Kolmogorov-Smirnov test to check if the data of each sample is distributed normally.

Is this the appropriate approach?

thank you

Tadej

You can use the two sample Kolmogorov-Smirnov test to check whether the two samples come from the same distribution, which would give me some evidence that the samples can be merged.

If you need to check for normality I wouldn’t use the Kolmogorov-Smirnov test, but the Shapiro-Wilk test instead.

These tests are described elsewhere on the website.

Charles

thank you

Tadej

What about comparing three groups of data with different size (n1=70, n2= 200, n3= 270)? Is one factor ANOVA proper approach, and would you advise any additional test from real stat add-in?

You can perform one factor ANOVA even with groups whose sizes are different. The main things to note are:

1. With such unequal sample sizes, you need to be more careful about the homogeneity of variance assumption (since the test is less robust to departures from this assumption in models with unequal sized samples).

2. The power of the test will tend to be determined by the group with the smallest sample size

Charles

If I may suggest, put most recent comments at top of page. When I first came to this page I almost left instantly because comments were from 2013.

Rob,

Excellent suggestion. I have just made the change. Please let me know if now you don’t see the newest comments first.

Charles

Great site and I really appreciate the resource pack provided. I use it all the time. I have a question. I compared the results from the real statistics “T Test: Two Paired Samples” to excel’s “t-test: paired sample for means” and the P values differ slightly between them. Why is that? Am I doing something wrong.

Many Thanks

Charles

Found it. There were some merged cells in the column labels that was causing the difference. I unmerged the cells and put the labels on the row immediately above the data and re ran the comparisons and they are identical now. Sorry for the mix up. I truly love the way your data is presented. It is far better than excels. Just sayin.

Many Thanks

Charles

Hi Charles, THANK YOU so much for this website!

It really helped me.

I am using in my classes of Bioestatiscs and Statistiscs, I use Too R, but it is very eassy.

I am writing one Bioestatisct Book, can i use your website in order to do the examples?

Thank you

Gerardo,

I am very pleased that my site has helped you.

I am not sure what you mean by “can i use your website in order to do the examples?” If you want to use some of the examples from the website in your book, please send me an email so that we can discuss this further (my email address is found on the Contact Us webpage). I of course have no problem with you using examples from my website in your class.

Charles

Dear Charles,

I am a novice in a real statistics and in English as well.

After visit your website I hope you can help and suggestion.

I conducted experimental research and random selected to study group and control group. Test the characteristic difference between two group is Ok. Then I collected data at baseline, 4 weeks, and 8 weeks after intervention. After running repeated ANOVA it’s show that there are difference between the two groups. My problem is there are also significant between two group in the baseline measure. what can I do next?

That’s interesting. Was the assignments of subjects to the control and study groups done at random? If so, then it is quite surprising that you are finding a significant difference in the baseline measure. How big is your sample?

Charles

The random assignment was done by my research assistant. There are 25 in an experimental group and 16 for a control group.

Hi Charles, i have a question that i cannot seem to answer.

Scenario: i am unable to determine normality of the data, and i am unable to do transformation.

my p value for levene’s test is less than 0.05, which means that the variances are unequal. If this is the case, will i be able to still use the ANOVA test?

and if i am, am i able to use both scheffe method/dunnett T test as post hoc test?

This is an assignment worksheet, based on data that is given to us, we would need to interpret the output Hope you can help

Dawn,

I have the following suggestions:

Please look at the various techniques described on the webpage Testing for Normality regarding how to determine whether the data is normal

If Levene’s test fails, then you should consider using Welch’s test instead of Anova.

The specific follow-up test will depend on what you are trying to test, but Games-Howell is a likely consideration (see Unplanned Comparisons).

Charles

Hi Charles,

I would like to now if there is a version of your add in for Macs

Thank you

Ana

Ana,

Not yet. I need to put out a new version of the software for Macs, but I don’t own a Mac and so need to borrow one to create and test the release. I hope to have access to one soon.

Charles

Hi,

Thank you very much for posting this add in, it is really helpfull. Right now I am trying to make a cluster analysis that I see that you have it in your add in, but I get no results. The function “clust” that you use to assign a cluster to a data point gives no value at all. I was searching some explanation to this analysis in the website but couldn´t find anything at all. Please let me know if you can help me with this issue. Thank you very much,

Santiago

Santiago,

The cluster analysis capabilities haven’t been officially released yet. I expect to release it tomorrow (Friday). Stay tuned.

Charles

Santiago,

Are you able to use the cluster analysis capabilities in the latest release of the software?

Charles

Just came to share my thanks for this free addition to excel! It’s excellent to do quick normality test (Shapiro-Wilk) without all the complex manipulation. Congrats!

And yes, your website is amazing, very well explained.

Thanks again!

Thank you for you answer!

In encoding continuous variables like income, consumption etc. Is it better to use brackets and choosing their medians, brackets with respective codes like 1= <40000, or exact amount?

Usually I prefer to use the exact amount, but answer also depends on the type of test I need to perform.

Charles

have you ever created an excel spreadsheet that will predict the outcome of a game in any sport such as college or pro football, baseball or basketball.

If so can you post it

Sam,

I haven’t, but perhaps someone in the community has.

Charles

has anyone in the community put together an excel spreadsheet to predict the outcome of sports events that they would be willing to share

thanks sam

Excellently done, Charles!

Never seen such practical AND comprehensive statistics’ basics! And a cherry on top – Excel implementation.

Thanks a lot!

sir, i have to find, any correlation between ‘time of income’ (like daily, weekly etc.) and food supply from PDS (public distribution system) i have following coding for this.

time of income is if daily-1, weakly-2, fortnight-3, monthly-4, Binnual-5, triannual-6, annual-7

and

Do you get sufficient food grains from PDS if Yes-1, No-2

sir i have 1 question for you which statistical tool is useful for

X 1 2 3 4 2 1 2 3 1 4 2 3

Y 1 2 1 2 2 1 2 1 2 2 1 2

this kind of data. it’s actually a coding data of ‘Yes’ ‘No’ type question. please guide me sir.

You haven’t provided me with enough information to answer your question. What is it that you are trying to demonstrate or test?

Charles

Hi Charles, THANK YOU so much for this website! 🙂

It really helped me.

Hi,

I am curious if there is way in hypothesis testing that allows me to establish a relationship, such that the scores of the independent variables are measured on a likert scale and values of the dependent variable is dichotomous (e.g. Yes and No, True and False). The sample does not have equal variance (ie sample size for YES do not equal to sample size of NO).

I tried doing a t-test but it does not make sense that the dependent variable is dichotomous. Is there any method that can be used to solve this issue?

Thanks in advance and hope to hear from you soon

This depends on what you are trying to test. In any case, what you can try is using a t test of sample 1 vs. sample 2. Here sample 1 = the Likert scores that have dependent variable value YES and sample 2 = the Likert scores that have dependent value NO. You can also do a correlation test (the results should be the same as explained on the webpage http://www.real-statistics.com/correlation/dichotomous-variables-t-test/.

Charles

Hi Charles

Real Statistics Using Excel is a very informative, clear, easy to follow and accurate description and analysis of various statistical techniques. It is useful for personal use and classroom teaching. Thank you for setting it up.

In Monte Carlo Simulation – I like to extend it using Markov Chains. Do you have any materails on this using Excel?

Does the Bin and Histogram are accurate in Excel because I have noticesd that it doe snot always count it correctly.

Thanks and Regards

Farhad

Farhad,

I am very pleased that you find the website useful. It is good to hear this.

I plan to add Markov Chains to the website. What specific things would you like me to cover?

I haven’t noticed any inaccuracies with the bins/histogram capabilities built into Excel. Can you give me an example?

Charles

Let me congratulate you for the work done! Amazing website, absolutely helpful and clearly explained. I’ve recommended already to many colleagues and I will keep doing this. Thank you for also make the code open to everybody.

G’day Charles,

I like the look of this package and hope that it will be very helpful, and I thank you for your efforts to date.

One question I have is about which Anova to use. I have data with 6 pigs on 5 diets, and 7 different blood measures, which are dependent to the pig from which they came.

I believe I should use the one-way anova (only one factor – diet) with repeated measures (7 blood measurements all taken at the same time). Can you confirm this?

Also, do the factor inputs need to use numbers (1 to 5 for the 5 diets) or can they use categories (AS, BS, CS, DS and ES)?

Many thanks in advance.

Johnno,

One factor is the Diet (5 levels), presumably the diets are independent of each other since they are for different pigs

Another factor is Blood Measure (7 levels), these are not independent of each other since they relate to the same pig

This means that you have a two factor model with one repeated measures factor

The factor names don’t have to be numbers; they can be alphanumeric. The data, which go in the cells, (which are probably blood measurements) need to be numeric.

Charles

G’day Charles.

Thanks for the reply.

I think I may not have explained it correctly. The only treatment being applied are the diets, being 1 control diet and 4 treatment diets (called D1,D2,D3,D4,D5). At the end of the experiment a single vial of blood is taken and 7 measurements made, ie total iron, ESR, pH, transferrin, calcium, cholesterol and TG’s.

When I tried to run either single or two way on my data there is no way to input the blood results separate from the factor (which was coded as D1, D2 etc).

I have 8 columns labelled Diets, Fe, ESR, pH, Trans, Ca, Cho and TG. Am I setting up the data correctly?

Johnno,

Assuming that you have assigned the pigs randomly to the 5 diets, you have a Diet factor which is a fixed factor (not repeated measures). You can now run 7 separate one-factor ANOVA’s, one for each pf the 7 blood measurements made. This would not take into account any correlations between the blood measurements. Alternatively you can run one two-factor ANOVA where the diet factor is fixed and the blood measurement factor is repeated measures. Another alternative is to view the blood measurements as a dependent variable and use one factor MANOVA where the diet is the independent variable and blood measurements are the dependent variables.

The result would be different if all the pigs are getting all 5 diets (presumably at different points in time.

Charles

How could I perform Probit (say brine shrimp lethality assay) using the add-in? I am a complete novice.

I have not yet implemented Probit in the Real Statistics Resource Pack. Logistic Regression (Logit) can be done using the add-in. For most analyses this is an alternative technique, and is more commonly used.

Charles

How then Sir, could I use Logit in lieu of Probit using the add-in?

See the webpage http://www.real-statistics.com/logistic-regression/, especially the options for Finding Logistic Regression Coefficients using Newton’s Method, which leads to the following webpage:

http://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/

Charles

Hello Charles,

Thank you for making this program available. I just installed in on my MacBook Pro running Yosemite. While I was able to install the program I get a run time 424 error message when I try and use the t-test function. The other statistical functions seem to work fine. Any suggestions?

Thank you.

Wayne

Wayne,

Unfortunately, I don’t own a Mac and so I need to borrow one to support the software. I plan to do so on Friday at which time I will try to identify the problem. I also plan to bring out a new release for Mac fairly soon.

Charles

Hi Charles,

Good morning! You have not mentioned any methodology for doing forecasting, please add forecasting method too. It will really help many of us to go further analysis in time series and so on. while doing forecasting in excel it takes count of the variable like

Year: 1990 1991 1992 1993 1994 1995

Sales: 500 550 501 502 510 500

It taking the count of 1996 for sales, is it right?

Please help me on this, am really confused.

Jerom,

The Real Statistics website and software address multiple linear regression (as well as logistic regression) and how to use these for forecasting. They also take the first step in addressing forecasting for timeseries by addressing autocorrelation. See the webpages

Linear Regression

Multiple Regression

Autocorrelation

I plan to include more materials on time-series in a month or two.

Charles

Hello Charles,

Happy New Year! I am very thankful for your hard work for providing simple, detailed and very resourceful information on your site. I have bookmarked your website. Thanks so much!

Jenkins

Dear Charles,

Thanks for sharing this very useful post. It was indeed very helpful.

Why am i getting a p-value=0 whne I use the SWTEST(). Is it a truncation or roundoff error.

Please advise.

Thanks in advance.

Anu

Anu,

If you send me a spreadsheet with your data I will try to answer your question.

Charles

I found the “Real statistics” extension of Excel by chance, while searching in the web. For curiosity, I tested it’s Principal component analysis results from a dataset of mine already analyzed with a popular software.

The results are exactly the same, obtained almost instantly… and for free !

It seems to be an incredible programming job done by an amazing guy certainly (sorry for my bad english expression).

Bernard Quebec city

Bernard,

Good to hear that the PCA worked as expected. I hope that you will continue to get value from the software in the future.

Charles

Dear Charles,

This is a great work. Thank you very much for these tools.

I want to perform trend analysis of climatic variables. I also want to do statistical downscaling of climate data. Please, let me know, how can I do that using excel.

Tapos,

Excel’s charting capability provides a feature called

Trendline. This can be used for trend analysis. My website describes some of these. You can also consult http://www.wikihow.com/Do-Trend-Analysis-in-Excel.There are also Exponential Smoothing and Moving Average data analysis tools in Excel.

All the regression capabilities described in the website can be used as well.

Charles

Hello Mr. Charles,

I must say I’m impressed by this great work. I have a problem with the software, I’ve downloaded it but each time I press the ctrl-m keys I’m asked to enter a password. Could you please help me.

Hello Emmanuel,

It sounds like the Real Statistics Resource Pack was not installed properly. To find out for sure, enter the formula =VER() in any cell in an Excel worksheet. You should see the version number of the Real Statistics Resource Pack that you downloaded (e.g. 2.16.2). If you don’t see this, then the software is not recognized as an Excel add-in (which is what the installation instructions are intended to do). See the webpage http://www.real-statistics.com/free-download/real-statistics-resource-pack/ for more details. The key step is described in Figure 2 on that webpage.

Be assured that you don’t need a password to use the software. You just need to ensure that Excel recognizes the Real Statistics Resource Pack software as an “add-in”. This is what the installation instructions are intended to do.

Charles

Hola! I’ve been following your weblog for some time now and finally

got the bravery to go ahead and give you a shout out from

Dallas Tx! Just wanted to say keep up the great job!

Hi,

No question. Just wanted to thank you for the incredible effort you’ve put into this site and the excel addin. Keep up the great work.

seen it to very useful in social research data analysis.

seen it to very useful in assisting my students during e-learning-teaching programmes.

Hi Charles,

I generally don’t comment on posts, but I thought I should this time, to tell that the work is simply wow!!

Keep it up!

Muzz

Hey,

I want to perform the independent t-test on over 200 pairs of ranges. It is impossible for me to go through the traditional method. So I couldnt get around a way to use this Add-in the macro.

Can you help me out with the same?

Thanks

Sure. You can use the Excel function

T.TEST(orTTESTfor versions of Excel prior to Excel 2010) or the Excel data analysis toolt-Test: Two-Sample Assuming Equal Variancesort-Test: Two-Sample Assuming Equal Variances. You can get more information about these via the webpage http://www.real-statistics.com/students-t-distribution/.Charles

Chirag,

One more thing. You can call the Real Statistics functions from within a macro that you write using the approach described on the webpage http://www.real-statistics.com/excel-capabilities/calling-real-statistics-functions-in-vba/.

Charles

Hello Sir,

This is great website! Learning lot out of it! Definitely one of the best websites for learning I’ve come across! Thanking you so much for sharing your knowledge here! Those tools are great piece of hard work and dedication! Thank you! 🙂 And ya looking forward for neural networks Add-Ins 😉

Ankit Bhatt

Hello Ankit,

I appreciate your comments and I am very happy that you are learning a lot from the website.

I plan to add neural networks capabilities, but I have a few other capabilities that I plan to add first. Stay tuned.

Charles

Very nice! Thank you, I have been looking for something like this ever since migrating to Mac from PC.

I noticed that I am unable to select the data cells with a cursor. But typing them in manually works. Is this a known issue?

Thanks!

Paul,

I am pleased that you like the site.

I was unable to get the approach that I used for selecting a range in the Windows versions of Excel to work in the Mac version of Excel. This seems to be related to something called RefEdit.

Charles

Sir

Could you please teach us “noncentral chi-square” and “noncentral F” distribution?

Colin

Colin,

What is your motivation for learning about these distributions? Are you interested in power?

Charles

Hi Colin,

I have just added new webpages on the noncentral t distribution, noncentral chi-square distribution and noncentral F distribution. In addition these distributions are used to calculate power and sample size requirements. There are also new functions in the Real Statistics Resource Pack that support these new capabilities.

Charles

Sir

When will your statistics & excel book be published? Can we get that book before July?

Colin

Colin,

I have a project to finish up by the end of May. I will then turn my attention back to the book. I hope to finish by July, but it depends on what else comes up.

Charles

Sir

Thank you for your reply and I am looking forward to your book. I leaned a lot of new things about statistics and Excel from your website. BTW could you please teach us something about discriminant function?

Colin

Colin,

I will eventually support discriminant function analysis, but I don’t plan to provide it in the next release.

Charles

Dear Charles:

I have been using Real Statistics since earlier versions were launched (is a very powerful statistical tool), but I have had a problem since then (that has not been solved in release 2.12): when I create an Excel file in a computer that uses Real Statistics formulas (for example, RANKSUM), and the file “RealStats.xlam” is installed in a certain Windows location (for example: “C:\Users\UserA-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam), if I open the Excel file in another computer, where the file “RealStats.xlam” is installed in another Windows location (for example: “C:\Users\UserB-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam), Excel ask for an update of the RealStats location. Although it is not a very huge problem, is not confortable to update the RealStats route when the Excel file is used between two (or more) computers. Is it possible to create an Add-in where the function calls be independent of the location of the RealStats file?

Another little problem is: I use “Ctrl-m” to access Real Statistics. It’s not possible to link an icon in the Quick Access Toolbar or in the Ribbon, as reported by some Real Statistics users (in spite of following the steps you suggest in your blog, January 28th 2014). As I suggested in a prior message (November 2nd, 2013), perhaps is better to use the Custom UI Editor Tool with the purpose of solving that trouble:

http://msdn.microsoft.com/en-us/library/ee691832(v=office.14).aspx

http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2009/08/06/7293.aspx

Thank you.

William Agurto

Dear William,

Thank you for your question. You point out a couple of difficulties that others are having as well (including me when I use the software).

To address the first issue (as I understand it) is quite straightforward. Simply select

Home > Editing|Find & Selectand choose the Replace option. Then insert “C:\Users\UserA-PC\AppData\Roaming\Microsoft\Complementos\RealStats.xlam in theFind Whatfield and leave theReplace Withfield blank. Click on theOptionsbutton and make sure thatWithinis set toBookand notSheetand then finally click on theReplace Allbutton. I could try to create a macro for this, but the problem is that the path used for the Real Statistics add-in file is dependent on each person’s computer and where they locate the add-in, and so it may not be so easy to implement. In any case the above approach is pretty easy to use.The second is a longstanding problem. Since Ctrl-m works for everyone I haven’t yet allocated the time and energy to solving it and instead have continued to put the time and energy into updating the statistical capabilities. I have looked into the approach that you suggested, but it doesn’t seem so easy to implement (at least for me). I will eventually get around to fixing this problem, but my apologies for not doing it yet.

Charles

Hi guys, I don’t even know where to start, but without any doubt I can say that your website is outstanding. With so much crap all around, finding you was a bless. I was looking for eigenvalue and eigenvector in excel to use in econometrics applications and I found you. Thanks a lot for all the effort that you put in the examples, explanations and other things. I usually use stata, eviews, minitab and shazam for statistics and excel only for quick things, but now I will use it more often. I will visit you once a week for now on.

Dear Charles

thanks for a very useful set of techniques.

Happen to develop more complex analyses, like General Linear mODel, or Mixed Linear Models?

Best, Alessio

Dear Alessio,

I plan to add these techniques in future releases.

Charles

Thank you so much for these tools!

If I may make one suggestion, it would be to have the output result default to a new worksheet, as it was not immediately intuitive to leave the field blank. Even a checkbox or something that simply cleared the field would be really helpful when I share these tools with others.

Also, have you, or would you be willing to look at developing tools related to the evaluation of risk or loss (i.e. loss triangles via chain-ladder methods, Benford’s law, etc….)?

Thanks again for the great toolset though, I find it so much easier and faster than other tools I used for quick analysis (R, mathmatica).

Thomas,

Thanks for your support and suggestions. I’ll look into the proposed changes and enhancements.

Charles

Thomas,

The latest release of the Real Statistics Resource Pack (Rel 2.11, issued today) enables you to output to a new worksheet by just pressing a button (see Blog for a description). Thanks for your suggestion. I plan to cover Benford’s law shortly. I’m still trying to decide what to do about the other topics you suggested.

Charles

Just curious as to if there has been any updates to this. Been a consistent user for about a year now and still love it!

Thomas,

There have been lots of updates to the software, website and examples file. See Blog for more details. I am very pleased that you have gotten value from the website for the past year.

Charles

Hi Charles,

Is there any chance that you could develop a neural network capability for excel?

that would be awesome!!

By the way thanks for your hard work. I really appreciate it.

Hi Hamed,

Thanks for your suggestion. I will add this to my list of possible upgrades. I developed a neural network capability in a different context many years ago.

Charles

Excellent site and exposition! Clear, concise, and exquisitely presented. This is a fine resource for statistical analysts using Excel.

HI,

I cannot find correlation in the ctrl m menu

Hi Brian,

After entering Ctrl-m choose the Matrix Operations option. When a dialog box appears, select the Correlation option.

You can also use the CORR(R1) supplemental function.

Charles

Hi Charles,

Happened to encounter an issue when using the add-in for Logistic regression. It only asks for input range – so should I presume that it will implicitly assume the first column data as the dependent variable and the rest of the columns as the explanatory ones. Also, I got an error when running the logistic regression citing “Type mismatch”. Any pointers to move beyond will be much helpful. As always, thanks in advance!

Rgds,

Arvind.

Arvind,

The Logistic Regression data analysis tool accepts two formats: raw data and summarised data. When the data is in raw format, the last column is for the dependent variable. It can only take the values 0 or 1. The other columns are for the independent variables. When the data is in summarised format, the last two columns are for summaries of the dependent variable. The last column specifies the total number of failures and the second to last column specifies the total number of successes. The other columns specify the values for the independent variables. See the Logistic Regression web pages in the website for more details.

The likely reason for the error message is that you used a value that was not 0 or 1 in the last column when using the raw data format. I have changed the software recently to detect this problem and so hopefully you would get a warning when filling in the dialog box rather than an error message during execution.

Charles

Hi Charles,

Really appreciate your effort in developing and upgrading (constantly!) the resource pack – I’m a huge fan of it! I particularly like the ease of use and sincerely thank you for your efforts.

Just as a side note, I thought if there can be an option to put the outputs in a separate worksheet (if not a workbook), it would be even more awesome. I’m not too sure of the complexity involved in adding this feature but just thought of sharing it with you.

Nevertheless, the add-in in its current form is itself wonderful.

Cheers,

Arvind.

Hi Arvind,

Thank you very much for your comments. You can already display output from a Real Statistics data analysis tool on a separate worksheet. Just leave the Output Range empty (by erasing the default range). This will display the output on a new worksheet.

Charles

Indeed had missed out on using that feature of getting the outputs in a separate worksheet by blanking out the output range. Appreciate the quick response Charles!

Rgds,

Arvind.

This website has been such an excellent resource in terms of building a statistical model. I don’t think I can thank you enough for the amount of detail, concision and information you have so graciously taken copious amounts of time and effort to compile.

I am looking for some basic SPC style charts X bar & R, Run, etc. I also want to plot batch data in a histogram with the statistical distribution super imposed over it. The data is typically normal. Can your software help me with that?

Richard,

The software does not do this at present. I will look into this for a possible future release.

Charles

Dear Charles,

thank you so much for putting so much effort into this website and – most of all – for making it available for download together with your really thorough explanations.

I implemented a few of your examples and they work perfectly, but I am having a hard time figuring out how to run the “One between subjects factor and one within subjects factor” example with unequal group sizes for the between factor (all groups are of size 7). I can see that the number of subjects for each level of the within factor would always be the same (since it’s repeated measures), but for the between factor, it is a very common case that we have more participants in one group than in another. From my textbook knowledge of ANOVAs, I know there should be a straightforward way to deal with this. I hope you can help me with this.

Thank you in advance.

Best,

Tim

Tim,

I am in the process of rolling out multivariate statistics capabilities. Once I finish with this later this month I will look into supporting unbalanced ANOVA with repeated measures. I agree that it would be useful.

Charles

Thanks for the quick reply. Looking forward to hearing from you 🙂

Tim

I just want to thank you for your effort to publish all this intelligence here.

Though I am using OpenOffice instead of Excel I found your explanations completely sufficient to build my own tables for the task I had to do (Mann-Whitney test)

I am sure to come back anytime I have to use this very helpful resource.

Thanks.

Norbert

Dear Dr. Aust,

Thank you very much for your support. I appreciate that you find the site a helpful resource. I am continually updating the site and hope that you find the new information helpful as well.

Charles

Hi Charles,

Does the Real Statistics resource pack work for Macintosh versions of Excel?

Thanks for your response,

Tim

Sorry Tim, but I have not yet created a version for the Mac version of Excel. I plan to create a Mac version, but I don’t yet a definitive release date established.

Charles

Wow! I think this is the best resource for statistics I have ever seen! The only issue I have had is using the function WTEST – it doesn’t work. Any thoughts?

Hi Teresa,

Thank you very much for your very positive opinion of the website. I have tried to make the site a resource for learning how to do statistics and a free set of tools to perform useful statistical analysis.

I have just retested the WTEST function and found that it works, at least on my computer. Can you send me the example that didn’t work? Please note that WTEST takes two arguments (while the function WTEST1 only takes one argument).

Charles

What great work in integrating educational information and the means to effect solutions using software I already own and use with free supplemental features.

Was browsing thru the site not too long ago and thought I saw mention of upcoming version release of the resource pack and/or mention of a new pack for multivariate study. Couldn’t find these posts or references today, nor any citation as to the date of the downloads to see if there is a newer version available. Are there upcoming releases any of the downloads planned for the near-term; or, what are the dates of the current downloads?

Thanks

Rich,

I have put a new release up earlier today (release 1.3), which provides some bug fixes (especially fixing the QQ Plot data tool). I expect to put another release out in a couple of weeks which provides improved versions of some of the existing functions and tools. I am testing it now. As far as the multivariate release (release 2.0), I hope to have this out before the end of the summer.

Charles

Hi,

thanks so much for putting effort in providing a free downloadable collection of important statistical functions in Excel. While I do use R for my own needs, collaborators more familiar with Excel will surely benefit from this.

However, I came across these AddIn because I was looking for functions performing Shapiro and Levene Tests and wonder why the results of your version (regarding the P-values) is quite different from other implementations.

I am not sure if this Reply-section is the best way to ask you about this but didn’t find another contact possibility. Below you will find an example.

All the best [jan]

Example:

2 Groups containing the following values

g1 = c(0.925,0.925,0.935,0.95,0.95)

g2 = c(0.69,0.71,0.85,0.89,0.955)

LEVENE() will result in a P-value of 0.0026

while R using the function: car::leveneTest(y=lm(c(0.925,0.925,0.935,0.95,0.95,0.69,0.71,0.85,0.89,0.955)~gl(2,5)))$Pr[1])

and this implementation:

http://www.stat.ufl.edu/~winner/computing/excel/levene.xls

both result in P=0.0313

Hi Jan,

Sorry for the delay in responding to your comment. The Levene test is based on the absolute values of the residuals. There are three versions of Levene’s test:

• Residuals based on the mean (as in the LEVENE function provided in the Real Statistics Resource Pack)

• Residuals based on the median (where “mean” is replaced by “median” in the calculation)

• Residuals based on the 10% trimmed mean (where “mean” is replaced by “10% trimmed mean” in the calculation)

LEVENE() uses the “mean” version and the other tools you mention use the “median” version. You can find an example of how to calculate the “median” version of the Levene’s test in the Workbook Examples, which contains all the examples described in the website plus some additional examples. You can download these examples by going to the page

http://www.real-statistics.com/free-download/real-statistics-examples-workbook/

and following the instructions for downloading the Examples Workbook and linking it to the Real Statistics Resource Pack.

In the next release of the Resource Pack I will add functions which calculate the other two versions of the Levene’s test.

Also please see the Homogeneity of Variance page to see where the mean version of the test is the better choice and where the median version is the better choice.

Charles

Update(20 July 2013): The new release of the Real Statistics Resource Pack (R1.4) modifies the LEVENE function so that you can specify which of the above three types of Levene tests you want to perform.