To use the ANOVA test we made the following assumptions:

- Each group sample is drawn from a normally distributed population
- All populations have a common variance
- All samples are drawn independently of each other
- Within each sample, the observations are sampled randomly and independently of each other
- Factor effects are additive

The presence of outliers can also cause problems. In addition, we need to make sure that the *F* statistic is well behaved. In particular, the *F* statistic is relatively robust to violations of normality provided:

- The populations are symmetrical and uni-modal.
- The sample sizes for the groups are equal and greater than 10

In general, as long as the sample sizes are equal (called a **balanced model**) and sufficiently large, the normality assumption can be violated provided the samples are symmetrical or at least similar in shape (e.g. all are negatively skewed).

The *F* statistic is not so robust to violations of homogeneity of variances. A rule of thumb for balanced models is that if the ratio of the largest variance to smallest variance is less than 3 or 4, the F-test will be valid. If the sample sizes are unequal then smaller differences in variances can invalidate the F-test. Much more attention needs to be paid to unequal variances than to non-normality of data.

We now look at how to test for violations of these assumptions and how to deal with any violations when they occur.

- Testing that the population is normally distributed (see Testing for Normality and Symmetry)
- Testing for homogeneity of variances and dealing with violations (see Homogeneity of Variances)
- Testing for and dealing with outliers (see Outliers in ANOVA)

Hi Charles,

I work with bacteria in soil and water. I am running statistics (or trying to) on my data which constantly violates normality and equal variances. I know the Welch Anova is recommended for unequal variance and that the Kruskall-wallace anova is for non-normality. What can be done if both of these ANOVA assumptions are violated at the same time? I understand transformations are useful in combating variance issues but I’d like to keep that as a last resort. Do you have any recommendations?

-Matt

Matt,

See the following:

scholarscompass.vcu.edu/cgi/viewcontent.cgi?article=5026&context=etd

Charles

Hi Charles,

I have a convenience sample of size 60. My IV is major of study (three levels) and my DV is hours of study a week. I would like to run an ANOVA to determine differences in the means of these three groups. What are the consequences (both theoretical and practical) of the fact that my sample is not random? Will it “just” limit my ability to generalize my results? Or will it prevent me to use the test altogether? What do you suggest in these cases?

Another, and related question: also other colleagues of mine use both ANOVA and the T-test with non-random samples (which can vary in size from 20 to 100) but, and this what puzzles me, they say that they do so without any inferential goal in mind… Basically they told me that all they want to do by using these tests is checking if the means are different among the groups in their sample. BUT, and this is my question, why running these tests if you do not have any inferential goal in mind? By inferential I mean to say smt about the population form your sample (even if non-random). In my understanding, these tests are made for inferential statistic. What do you think about it? Is there something I am missing here?

I very much appreciated your website, and will greatly benefit from your advice. Thanks so much in advance for taking the time to answer my questions!

Serena,

The whole point of using ANOVA is to generalize your results from the random samples to the corresponding populations. If you are only interested in using the results for the given sample, then, as you have said, there is no point in doing any inferential analysis. You can simply compare the sample means and draw no conclusions about the population means.

You say that the samples are not random, but how specifically were they drawn? Very often samples that are called random are not really random. E.g. in a lot of university research samples are drawn from the student body, based on students volunteering to participate. This is not really a random sample, but lots of research papers are written based on such samples.

Charles

Thank you very much! The sample I believe is convenient: I am asking people in my class (they come from different majors) how many hours a week they study using an online survey. I agree with you that sometimes we think we are collecting a random sample but we really aren’t. I guess my population can be my class in this case, as it is a very large class and I am only collecting a sample of 60 students in there?

Serena,

If you are sharing the results with other people, the important thing is to describe accurately the limitations of your sampling technique even if you use the standard analysis tool assuming random samples.

Charles

I have another quick question for another stats assignment. Thanks in advance for your help!

I am working on a stats assignment for which I am required to design a little study, collect my data, and run an ANOVA. In my study, my IV would be “social media platform used”, with three levels being: Snapchat, Twitter, Facebook. The DV is the number of posts posted per day.

The three categories of my IV are not mutually exclusive: should they be in order to run an ANOVA? If this is a potential issue, what is the best way to deal with it? Do I have to ask people to self-classify in one group to begin with? Or could I ask let’s say subject1 to provide an answer for each of the three groups, and then subsequently put subject1 in one of the three groups based on the highest score (for ex if subject 1 says Snapchat 2, Twitter 4, Facebook 6, then I would assign the subject to the Facebook group)? Is this theoretically correct? If I do so, would it be a within group research design (with the same subject measured three time)?

My DV should measure the hours spent on each of these platforms (by the same subject) or the hours spent in general on social media?

I wonder if I might violate the assumption of independency of the samples.

Thanks so much!

Serena,

I am reluctant to answer someone’s homework assignment, but the approach to use really depends on what the objective of the study is. One of your approaches might work for some situations, but not for most others. Another approach is to view this as a repeated measures ANOVA where you allow multiple types of measurements per subject.

Charles

I have a general question on your article – which I found very useful.

A question that constantly comes up is related to the nature of the treatments in an experiment, and whether or not ANOVA and means separation is acceptable, or regression analysis should be performed. Following is the question:

If treatment means are not independent of each other, is it still acceptable to do ANOVA and means separation, or is regression analysis the proper approach? For example, if treatments represent a continuum of concention, such as 0X, 05X, 1.0 X, 1.5X and 2.0X, to me the treatments are not independent and the samples are therefore not independent of each other. Am I reading your article correctly?

I am sorry about my terrible grammar in the previous post. I failed to review before submitting. My bad!!

Thank you so much for your time and trouble.

John,

In your example, you have 5 treatment groups. If you use a sample of 50 then as long as you assign 10 elements from this sample to each group at random, then you have independent group samples.

Charles

Dear Dr. Charles,

I have to perform a set of unpaired t-test on independent samples on a large number of endpoint variables (that is, I have to compare several male vs. female population characteristics).

For some variable, normality assumption is violated; for some other, homogeneity of variances is violated; for some other variable, both assumptions are not met.

The two samples are almost equal size (n about 50).

What is the best non-parametric test to use for such cases?

Do you think that, for a better consistency of all results, I should use the same method for testing all endpoint variables, independently from which assumptions are violated (if any) for each single variable?

Thank you very much for your valuable help.

Best Regards

Piero

Piero,

You can use the t test even if the variances are unequal. The test is pretty robust even if normality is violated provided that the data is reasonably symmetric.

If you meet the assumptions, the Mann-Whitney test is usually the best nonparametric test to use.

Charles

Dear Mister Zaiontz,

I would like to observe how physical performances outcomes are influenced by 2 categorical variables (1- physical activity level (low vs high) and 2-presence of a disease (0-1)). I would like to use a 2-way ANOVA where y= physical outcome, x1= presence of the disease, x2= sedentary/exercising.

The problem: The sample size is then not the same in each subgroup. So the model is unbalanced. One of the assumptions for the use of this type of ANOVA is therefore not met.

I read somewhere that we can get round this assumption by using type I sum of squares (sequential) instead of the usual type VI SS (unique). Is it true? Can we draw the same conclusions about the significance of the effects of the 2 variables and their interaction? I guess that it would be too easy and there must be some tricky considerations?

What is your opinion? Should I rather reduce/match the sample size in order to get equal groups?

It is not the first time that I find useful and clear answers to my questions on your website and I’m very grateful. I hope that my junk language was understandable for stat expert. Thx a lot.

Demonceau,

You can perform two factor Anova with unbalanced data by using regression. This is explained on the following webpage

Unbalanced Factorial ANOVA

I am pleased to read that my website has been helpful to you.

Charles

thank you!

Hi, you state that one of the assumptions are “Factor effects are additive” – is this an assumption that needs to be tested? How can I do that? Can you explain what this means a little?

Additive just means that you can use the usual ANOVA equations to model what is going on (as described on the website). I don’t test for this assumption explicitly. I probably should drop this assumption from the list since it is confusing.

Charles

For ANOVA test we can´t assume normalty by central limit theorem if we have an enough sample size?

André,

This is likely to be true, but you should check for normality just in case. As long as the data is not too far from normality you should be ok.

Charles

Hi

I am conducting a one way within subject Anova but my sphericity test is violated. I violated my Mauchler’s test and got a value of .702 so I guess I have to use the Greenhouse geisser. How would I report that in the results. Do I mention that the Mauchler’s test was violated and report the Greenhouse geisser instead?

How you report the results really depends on the requirements for publications in your discipline, but in general I would report that Mauchler’s test was violated and report the Greenhouse-Geisser correction (and even the Huynh and Feldt correcction).

Charles

Thank you so much for your reply.

I am writing my thesis and I have been reading around it and wasn’t sure on what to do . Will I still be okay to carry on with my ANOVA analysis even if I have this violation?

Tofi,

Generally yes, provided you use a correction factor.

Charles

what are the effects of violating the factor effects since they are additive

Nyasha,

Sorry, but I don’t understand what you mean by “the factor effects”.

Charles

Outline the assumptions which underlie the analysis of variance (ANOVA) and the possible methods for their detection and remedy?

I understand the first part of the question … the assumptions that underlie ANOVA but what are their possible methods of detection and remedy ?

Thanks

Don,

The brief answer is as follows:

Normality – ANOVA is quite robust to violations of normality, especially if each group is reasonably symmetric. If this assumption is strongly violated then you can use an alternative test (e.g. Kruskal-Wallis or Brown-Forsythe) or a transformation could be employed

Outliers – If some data are outliers, then you should check to make sure that there wasn’t some error in measurement or in copying the data. If that is not the case, then you can use a rank-oriented test instead (e.g. Kruskal-Wallis), use a trasformation or go ahead and perform the ANOVA, once with the outlier and another time with the outlier removed.

Homogeneity of Variances – This is covered on the website. See

http://www.real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/dealing-with-heterogeneous-variances/

Charles

thank u sir

Hello Charles,

May I ask some questions about ANOVA and two sample t-test?

1) For experiment 1, there are three experimental groups. Two group data sets passed normality test, one failed (P=0.045). I used Kruskal-Wallis One Way Analysis of Variance on Ranks to compare the three groups. Is this the right choice? Or I should use ANOVA, since the p values is close to 0.05? What if the P values for the third group is 0.014?

2) For experiment 2, I have two sets of data. Group A: 1.12, 1.07, 1.12, normality test P<0.001. Group B: 0.05, 0.12, 0.35, normality test P=0.430. Because group A failed normality test, I used Mann-Whitney Rank sum test to compare the two groups, with P=0.077. However, if you look at the raw data, group A values are much bigger than group B values. It does not make sense that there is no significant difference between these two groups. Just for curiosity, I also run t-test to compare these two groups, with P=0.000542. In this situation (two data sets, only one pass the normality test), is the nonparametric test the correct test I have to use to compare these two groups?

Thank you very much!

Hello Xia,

1) For data that is so close to normality (p = .045), generally I would just use ANOVA provided the homogeneity of variances assumption is met. ANOVA is much more sensitive to violations of this assumption and is pretty robust to violations of normality. Even if one group has p = .014 when testing normality probably ANOVA is the right way to go provided the data is relatively symmetrical and there aren’t problems with outliers. You can use a box plot to see whether the data is relatively symmetric.

2) For two samples, if each group is relatively symmetric, I would use the t test. Without seeing your data I can’t say why the results from the MW test are so different from those of the t test; generally they would be similar if the data is symmetric.

Charles

Hello Charles,

Thank you very much for your reply!

1) For experiment 1, both data sets that failed the normality test (p=0.045 and p=0.014) are not symmetric, according to the box plot. Therefore, a nonparametric test should be used for the analysis, right?

2) For experiment 2, there are two experimental groups. I only have three values for each group. The data for group A are: 1.12, 1.07, 1.12 (normality test P<0.001). The data for group B are: 0.05, 0.12, 0.35 (normality test P=0.430). The results from t-test (p=0.000542) and Mann-Whitney Rank sum test (p=0.077) are very different.

Thank you!

Xia,

1) Yes you would normally use a nonparametric test.

2) With only three data points in each group, I would expect too much from either statistical test. Given that the first group is symmetric (at least from what you can see from the box plot) and the second group is normal, I would use the t test result. Also just looking at the data indicates that the population means are likely to be different. Again, with such small samples I would be very cautious about any conclusions.

Charles

Thank you very much Sir for your effort : I have 2 questions:

1. How can I judge the factor effect and how could I judge is it additive or not …?

2. if the assumption of homogeneity of variance is not met .. i.e. significant Levene test … what do you recommend to use Welch ANOVA or Brown-Forsythe test..?

thank you very much again.

1. I don’t understand by judging the factor effect.

2. Usually Welch ANOVA.

Charles

I mean the last assumption “Factor effects are additive” .. I could not understand it .. and how to test for it.

This assumption is based on the fact that ANOVA is essentially a type of linear regression. See Regression Model for ANOVA.

I wouldn’t explicitly worry about this assumption. The usual ANOVA tests will essentially show whether this assumption has been met.

Charles

Can ANOVA still be used if most of the data sets show normality but not all of them? Out of 21 data sets, 3 don’t show normality according to shapiro-wilk and Kolmogorov-smirnov tests.

Rae,

ANOVA is quite robust for violations of normality. It should be valid provided the data in these three groups are not too skewed.

Charles

what is the validity of anova?

Raheem,

A test is valid if it measures what it claims to measure. I don’t think that ANOVA is the type of test this definition is intended to apply to, but if I do apply it to ANOVA, I guess I have to conclude that when the assumptions of ANOVA are met then ANOVA does measure what it is supposed to subject to the type I and type II error rates.

Charles

Do the samples have to be random or can you use this test on data collected from random samples?

Also same question for chi-square and t-tests?

Thanks

L

Lola,

The data is collected from random samples, but the data is not random for any of these tests.

Charles

So it’s not a rule that you can only use these tests if the sample was collected in a random manner?

Lola,

It depends on what you mean by random. For most tests, samples should indeed be collected in a random manner. That doesn’t make the numbers random. These values must be collected randomly from the population that we are studying.

Charles

I mean for example if we use survey data should the sample of respondents be for example stratified or systematic taken from a complete sampling frame whereby all members of the population stand an equal chance of selection. As opposed to say a convenience based or self select survey?

Lola,

Obviously a random sample is better. Many study are conducted with self-selected participants because it is easier to get a sample in this way. Although statistical analyses can be made, the results may not be reliable since the sample is not random.

Charles

Before now, my major concern was the ”assumptions of the ANOVA”, but this your analysis has been of great help. I visited and was able to surprise my lecturer during ST 525 lecture and i did well even in exams. I’ve also developed interest for design of experiments in advancement. I REMAIN GRATEFUL.

Lionel,

Thank you for sharing this. I am always pleased to see that the site is helping people.

Charles

sir

These assumption are used after fitting the model???

You should make sure the assumptions hold before you spend a lot of energy building and analyzing the model. ANOVA is pretty robust to violations of normality, but not so robust to violations of homogeneity of variances. Thus if the variances are very different, the results of ANOVA can be completely inaccurate.

Charles

Am trying to using statistical analysis of anova in flood hazards . how do i use rainfall data and flood events to do my analysis , since there is no flood data in Africa.

urgent attention sir

Thanks

Sorry but your question is not very clear to me.

Charles

how do you calculate one rainfall data that is jan – Dec for 20 year to determine weather it can leads to flood or calculating the trends of rainfall

what are the necessary conditions for orthogonal contrast?

Chris,

Contrasts are orthogonal when the sum of the products of their coefficients is zero. This is explained on the following webpage:

Planned Comparisons.

Charles

Could you please enlighten me on this; anxiety is assessed with the use of three groups of participants utilising differing amounts of resources, no resources, two resources and five resources. The groups are also assessed in terms of gender differences between male and female. Is this a one way anova, a factorial anova or something else? I something else what would it be?

Steve,

In your description you have defined two factors. Factor A = resources (3 levels) and B = gender (2 levels). This fits the description of factorial anova. Of course, if you don’t care about the influence of gender, then you won’t include factor B in the model, which results in a one-way anova.

Charles

Hello

You say

“All populations have a common variance”

but you are calculating the variance SSB.

Maybe it’s because the assumption is about the pupulation variance and you calculate the sample variance?

As usual, inferential statistics makes inferences about populations based on the observed sample. Since there is some chance that what we conclude based on the sample is not indeed true of the population, the results are probabilistic in nature.

Charles

There is this question I was ask to solve. It goes thus: “what are the assumptions required in the use of ANOVA for a regression analysis”. Does this mean Anova assumption or regression…I need an urgent answer

It sounds like they want the assumptions for regression, but I don’t know the intention of someone else when they ask such a question.

Charles

If my whole population is being used (so I don’t have sample). Can I use ANOVA for that? Could someone please let me know his/her idea, and also if knows any reference regarding that.

Mike,

If you have the entire populations, then you don’t need to inferential statistics (since you have 100% of the information); you only need to use descriptive statistics. E.g. suppose you have three groups with means 23.4, 26.1 and 26.2, then you know that the three groups have different means since you can see that the values are different. You could use the effect sizes to get an idea of how big the difference between the means are.

Charles

help me and provide answers to the following questions: (1) assumptions of analysis of variance? (2) implications of non parametrics?

Jerome,

The referenced webpage describes the assumptions for the analysis of variance. For the implications of using non-parametric tests, see the webpage Nonparametric Testing.

Charles

Do you have any formal reference regarding the F statistic is relatively robust to violations of normality provided the two listed conditions? This reference would be valuable for a text I need to write for school.

Thank you.

Josue

There are a number of references regarding robustness to violations to normality, with slight differences from one to the other. Here is one such reference:

Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson

Charles

Thank you!

is there any data u provide to check the ANOVA assumptions , their violations and effect on results. kindly show these things on real data

Yes. See the Testing for Normality webpage regarding the normality assumption.

See the Homogeneity of Variances webpage regarding the homogeneity of variances assumption.

Charles

nice summary

Does a one-way ANOVA require that the responses be linear with group?

Also is a one-way ANOVA applicable for analysis of data such as response to increasing doses of a drug?

1. There is no linearity assumption for ANOVA

2. Yes, ANOVA can be used to compare responses to difference drug doses.

Charles

Sir

You worte:” the F statistic is relatively robust to violations of normality provided: The cell sizes are equal and greater than 10″ Does the cell size mean sample size?

Colin

Colin,

It means the group sample sizes. I have just revised the webpage to try to make this clearer.

Charles

Helpful