Essentially **Analysis of Variance** (**ANOVA**) is an extension of the two sample hypothesis testing for comparing means (when variances are unknown) to more than two samples. In this part of the website we deal with the simple case, namely **One-way ANOVA**.

Topics:

- Basic Concepts
- Confidence interval
- Experiment-wise error rate
- Planned comparisons
- Unplanned comparisons
- Assumptions
- Homogeneity of variances
- Outliers
- Effect Size
- Power and Sample Size
- Confidence intervals for ANOVA effect size and power
- Kruskal-Wallis Test
- Welch’s Test
- Brown-Forsythe F* Test
- Mood’s Median Test
- Resampling for ANOVA

Researchers desire a reliable predictor for Juvenile Idiopathic Arthritis flare ups. It is suspected that levels of phagocyte activation marker myeloid related proteins 8 and 14 hetercomplex (MRP8/14) are good markers. Known mean MRP8/14 in patients with JRA is 500 ng/ml with standard deviation of 200 ng/ml. A difference of 100 ng/ml is considered clinically significant. What sample size (per group) of patients with JRA is necessary to compare MRP8/14 measures between groups of with and without flare ups if the t test is to be used? Assume MRP8/14 is normally distributed in the population. (Foell D, et al, Methotrexate withdrawl at 6 vs 12 months in Juvenile Idiopathic Arthritis in Remission. … Could u break this down?

Jiro,

Sample size for t tests: Please see the webpage Sample size requirements for t test. Also see Real Statistics Power Data Analysis Tool.

“Could u break this down?”: Sorry, but I don’t understand your question.

Charles

Dear Sir,

I tried doing the post-hoc test but got no significant result. I’m confused. My ANOVA result was significant at p <0.01

What to do?

Meera

Meera,

Which post-hoc did you perform? Most likely the problem is that you need to fill in the highlighted range with contrast values as described on the webpages

Planned Comparisons and Unplanned Comparisons.

Charles

Dear Sir,

I went for Scheffe’s. Not using the Resource Pack though. When I tried it in the Resource Pack it said “Compile error in hidden module: frmAnova1″

Meera

Meera,

That is not good. What version of Windows and Excel are you using? Are you able to use other Real Statistics Resource Pack capabilities?

Depending on what you are trying to demonstrate, Scheffe’s is usually not the best post-hoc test to use. Usually Tukey HSD gives better results.

Charles

I will Sir. Thank you for helping me out.

Meera

Dear Sir,

I didn’t quite get this part “If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.”

I got significant results at p<0.01.

Meera.

Meera,

If for example you had four groups, the significant result from the ANOVA test tells you that there is a significant differences among the means of the four groups, but it doesn’t tell you which groups(s) have different means. If you want to better understand this then you need to conduct some follow up test. See the website for more details and examples about this.

Charles

Dear Sir,

Is it mandatory to do a post hoc test after ANOVA? I need to prove that the variable ‘annual income’ influences error count. Can I simply do an ANOVA and leave it at that?

Meera,

If ANOVA gives you sufficient information for the test you are trying to make then you can leave it at that. Particularly if you get a non-significant result then you will typically want to leave it at that. If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.

Charles

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

I appreciate, cause I discovered exactly what I was looking for.

You’ve ended my 4 day lengthy hunt! God Bless you man. Have a nice day.

Bye

Hi Charles:

Thank you very much for this website. I have been benefited from you website in a number of occasions.

I have a question about the ANOVA test:

1. Does it necessary to have the whole population to do Anova or we can as well use the average values of the population to do Anova.

2. I have several p-values from a number of Anova tests. What is the possibility of combining all these p-values to come up with one p-value. Is there any way of averaging the p-value for one system.

Thanks very much.

Subhu,

1. There is no point in running an ANOVA if you have access to the whole population’s data. You can just look at descriptive statistics on the population. If by population, you mean sample, then I am not sure what average values you are referring to. Perhaps a more concrete example would be helpful in understanding what you are trying to accomplish.

2. I can’t see any benefit in averaging p-values. What is it that you are trying to accomplish?

Charles

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

Subhu,

As always the answer to your questions depend on what you are trying to prove/test. It sounds like you have three factors (i.e. independent variables): cross-section, height, instrument. Velocity is the dependent variable. If you want to understand the interactions between these factors then you probably should use a 3 factor ANOVA (instead of a one-factor ANOVA). You may also have a fourth factor, namely time, although this may be equivalent to the cross-section factor.

Question 1: Assuming time and cross-section are equivalent, and you don’t care about differences at the cross-section level, then you could use time-averaged velocity; otherwise you would need to use velocity at each cross-section. It is really up to you and to what you are trying to study. Generally it is best to keep all the detail, but at some point (certainly at the fourth factor level) too much data makes any analysis too complicated.

Question 2: If you make height a factor, then only one p-value is created for the height factor instead of 3 p-values.

Charles

You have simply the best material for teaching statistics. Thanks a lot for producing this!