Essentially **Analysis of Variance** (**ANOVA**) is an extension of the two sample hypothesis testing for comparing means (when variances are unknown) to more than two samples. In this part of the website we deal with the simple case, namely **One-way ANOVA**.

Topics:

- Basic Concepts
- Confidence interval
- Experiment-wise error rate
- Planned comparisons
- Unplanned comparisons
- Assumptions
- Homogeneity of variances
- Outliers
- Effect Size
- Power and Sample Size
- Confidence intervals for ANOVA effect size and power
- Kruskal-Wallis Test
- Welch’s Test
- Brown-Forsythe F* Test
- Mood’s Median Test
- Resampling for ANOVA

You have simply the best material for teaching statistics. Thanks a lot for producing this!

Hi Charles:

Thank you very much for this website. I have been benefited from you website in a number of occasions.

I have a question about the ANOVA test:

1. Does it necessary to have the whole population to do Anova or we can as well use the average values of the population to do Anova.

2. I have several p-values from a number of Anova tests. What is the possibility of combining all these p-values to come up with one p-value. Is there any way of averaging the p-value for one system.

Thanks very much.

Subhu,

1. There is no point in running an ANOVA if you have access to the whole population’s data. You can just look at descriptive statistics on the population. If by population, you mean sample, then I am not sure what average values you are referring to. Perhaps a more concrete example would be helpful in understanding what you are trying to accomplish.

2. I can’t see any benefit in averaging p-values. What is it that you are trying to accomplish?

Charles

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

Subhu,

As always the answer to your questions depend on what you are trying to prove/test. It sounds like you have three factors (i.e. independent variables): cross-section, height, instrument. Velocity is the dependent variable. If you want to understand the interactions between these factors then you probably should use a 3 factor ANOVA (instead of a one-factor ANOVA). You may also have a fourth factor, namely time, although this may be equivalent to the cross-section factor.

Question 1: Assuming time and cross-section are equivalent, and you don’t care about differences at the cross-section level, then you could use time-averaged velocity; otherwise you would need to use velocity at each cross-section. It is really up to you and to what you are trying to study. Generally it is best to keep all the detail, but at some point (certainly at the fourth factor level) too much data makes any analysis too complicated.

Question 2: If you make height a factor, then only one p-value is created for the height factor instead of 3 p-values.

Charles

I appreciate, cause I discovered exactly what I was looking for.

You’ve ended my 4 day lengthy hunt! God Bless you man. Have a nice day.

Bye

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

Dear Sir,

Is it mandatory to do a post hoc test after ANOVA? I need to prove that the variable ‘annual income’ influences error count. Can I simply do an ANOVA and leave it at that?

Meera,

If ANOVA gives you sufficient information for the test you are trying to make then you can leave it at that. Particularly if you get a non-significant result then you will typically want to leave it at that. If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.

Charles

Dear Sir,

I didn’t quite get this part “If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.”

I got significant results at p<0.01.

Meera.

Meera,

If for example you had four groups, the significant result from the ANOVA test tells you that there is a significant differences among the means of the four groups, but it doesn’t tell you which groups(s) have different means. If you want to better understand this then you need to conduct some follow up test. See the website for more details and examples about this.

Charles

I will Sir. Thank you for helping me out.

Meera

Dear Sir,

I tried doing the post-hoc test but got no significant result. I’m confused. My ANOVA result was significant at p <0.01

What to do?

Meera

Meera,

Which post-hoc did you perform? Most likely the problem is that you need to fill in the highlighted range with contrast values as described on the webpages

Planned Comparisons and Unplanned Comparisons.

Charles

Dear Sir,

I went for Scheffe’s. Not using the Resource Pack though. When I tried it in the Resource Pack it said “Compile error in hidden module: frmAnova1”

Meera

Meera,

That is not good. What version of Windows and Excel are you using? Are you able to use other Real Statistics Resource Pack capabilities?

Depending on what you are trying to demonstrate, Scheffe’s is usually not the best post-hoc test to use. Usually Tukey HSD gives better results.

Charles

Researchers desire a reliable predictor for Juvenile Idiopathic Arthritis flare ups. It is suspected that levels of phagocyte activation marker myeloid related proteins 8 and 14 hetercomplex (MRP8/14) are good markers. Known mean MRP8/14 in patients with JRA is 500 ng/ml with standard deviation of 200 ng/ml. A difference of 100 ng/ml is considered clinically significant. What sample size (per group) of patients with JRA is necessary to compare MRP8/14 measures between groups of with and without flare ups if the t test is to be used? Assume MRP8/14 is normally distributed in the population. (Foell D, et al, Methotrexate withdrawl at 6 vs 12 months in Juvenile Idiopathic Arthritis in Remission. … Could u break this down?

Jiro,

Sample size for t tests: Please see the webpage Sample size requirements for t test. Also see Real Statistics Power Data Analysis Tool.

“Could u break this down?”: Sorry, but I don’t understand your question.

Charles

kindly sent how to calculate likert scale average in excel

Sorry, but I don’t understand what you mean by a “likert scale average”. There is some risk in taking the mean of likert scale data, since you don’t necessarily know the distance between the sclae elements. If you can determine what these distances are, then you would take the weighted average. Most people simply take the simple average, and ignore this issue.

Charles

Dear Sir,

I followed your suggestions in doing ANOVA and simply wish to clarify a doubt. My ANOVA result is significant at p<0.01 but a couple of pairs in post-hoc are insignificant. This won't challenge my finding that the variable influences the error count, will it?

Meera.

Meera,

The significant result from ANOVA means that at least two of the variables have unequal population means. It does not mean that all the pairs of variables have significantly different means. Thus, the fact that some of pairs in post-hoc are insignificant, does not challenge your finding that the means of all the variance are not equal.

Charles

Sir I am using excel format for my analysis. I am not having clear idea to select the variables for one way anova. let us do ANOVA analysis between frequency of visit of the respondent and income per month of the respondent. please do reply me sir. Am awaiting for your answer

The example you gave seems more like a regression problem than an ANOVA problem.

Charle

Thank you very much sir.but could you please provide me some examples for one way ANOVA. How to select the 2 variables. What Kind of variables should be taken for analysis.

There are some examples on the referenced webpage and others throughout the website. One-way ANOVA has one independent variable (called a factor) which takes categorical values and one dependent variables which takes continuous values. E.g. for Example 1 on the webpage http://www.real-statistics.com/one-way-analysis-of-variance-anova/basic-concepts-anova/, the independent variable takes values for the three flavors and the dependent variable takes the score values 13, 12, 7, etc.

Charles

Thank you very much sir. I got an idea about the selection of variables.please sir could you reply me whether its correct or wrong with my selection of variables for the analysis. I took between the age and level of satisfaction towards schemes as my study is about it

Sorry, but I am unable to answer your question based on the limited amount of information that you have supplied.

Charles