Essentially **Analysis of Variance** (**ANOVA**) is an extension of the two sample hypothesis testing for comparing means (when variances are unknown) to more than two samples. In this part of the website we deal with the simple case, namely **One-way ANOVA**.

Topics:

- Basic Concepts
- Real Statistics analysis tool and confidence intervals
- Experiment-wise error rate
- Planned comparisons
- Unplanned comparisons
- Assumptions
- Homogeneity of variances
- Outliers
- Effect Size
- Power and Sample Size
- Confidence intervals for ANOVA effect size and power
- Kruskal-Wallis Test
- Welch’s Test
- Brown-Forsythe F* Test
- Mood’s Median Test
- Homogeneity of the Coefficients of Variation
- Resampling for ANOVA

can we do ANOVA or Two-way ANOVA in count data

or if we have count data can use it to make ANOVA

Ahmed,

If by count data you mean ordinal data (e.g. Likert scale data), then you can use ANOVA provided the distances between the scale values can be considered to be equal (e.g. a value of 4 is the same amount more than 3 as 5 is more than 4, etc.) and the other assumptions for ANOVA are met. With such data you may find that the assumptions are not met though.

Charles

Thanks, I have three concentration of insecticide and under each one of concentrate, i have three replications. I count a number of insects which die under each concentrate in each replicate so, I have nine reading. I ask if I can use one-way ANOVA to study if different between three concentrate significant or not and then I will LSD if sig.

Ahmed,

One-way ANOVA can be used for this purpose (assuming the assumptions are met). I would then use Tukey HSD instead of LSD as the follow up test. Keep in mind that your results will be limited since your sample size is so small.

Charles

So, What is the best statistical test for my experiment

Ahmed,

It really depends on what hypotheses that you want to test, but one-way Anova may be appropriate.

Charles

Charles,

I am working through another scenario and have a question:

Dr. Brown is a pediatric nurse practitioner. She would like to know the effectiveness of different HIV prevention programs on adolescences’’ sexual practices. She conducted a randomized clinical trial to see the differences between 3 different groups of high school students (attending one sexual education class, attending one sexual education plus one skills training class, and attending regular class- no sexual educational class provided). She used a safer sexual practice (SSP) questionnaire to measure the outcome of her study. The higher score on the SSP questionnaire indicates safer sexual practices. The students were randomly assigned to 3 different groups. The questionnaires were given to the participants of the study, 4 weeks after attending the classes.

I’m not sure if I need to use a one way anova or two way anova. Part of my issue with understanding which test I need is being able to properly identify my dependent and independent variables. I start to work through each assumption for the one way anova and my variables are mixed up. The dependent variable I know is continuous and that my independent variable consists of 3 or more categorical and unrelated groups.

I am writing my hypotheses like this:

Ho: There is no significant difference in means of the interest variable among the three different groups.

Ha: There is a significant difference in means of the interest variable among the three different groups.

I lean more towards the one way anova, but I’m not really sure. Any help would be appreciated, you have been so great.

Misty,

It depends on what “the interest variable” is.

You can use one-way ANOVA where the variable of interest (i.e. the dependent variable) is the score on the test. The group is the independent variable (categorical with three possible values). You would then compare the means of the three groups.

If you tested each subject prior to the educational classes and then after, you could use two-way ANOVA with one fixed factor (Groups) and one repeated measures factor (Time).

Charles

Hi.

I have 23 variables collected at each of my 20 sites.

I want to see if there is a significant difference of each variable between the 20 different sites.

Can I do this in a one-way anova or do I need to do another test.

Cheers

Holly,

If you are not interested in the interactions between the different variables, then you can use one-way Anova (provided the assumptions are met).

Re other tests, please see all the different parts of the following webpage:

http://www.real-statistics.com/anova/

Charles

Hi Charles,

Would like to ask for your help regarding our experiment. We have 3 different treatment or group namely: color, size and shape. What statistical tool is appropriate for this? Thank you.

Nica,

You need to provide more information before

Hello Charles,

I have two groups and I want to compare three means of three parts of the website page (i.e three AOI’s).Can i use one way anova ???

Your quick answer is highly appreciated.

Thanks

Khaild,

Sorry, but I don’t understand your question.

Charles

Is there time difference in time spent on website between customer of 3 region , i have to apply multiple sample test here please help me with some idea

Shivali,

I don’t see any reference to customer of 3 region on the referenced webpage. Please explain better your question.

Charles

Dear Zaiontz

I have data from 10 individuals over 24 points (i.e. every week for six months). As the data is not normally distributed I need to use Frieman’s ANOVA. My question is whether 24 points is to many to run one-way repeated measurement? I would really appreciated if you could reply me.

Yours sincerely,

Sam

Sam,

I suggest that you try running the Friedman’s test on your data to see whether you get a p-value. See

Friedman Test

Charles

Hi Charles,

Thank you for the prompt reply!

I used SPSS to run the Friedman’s and the p-value came out significant, Chi-Square(23)=44.39, p=.005. Now, I think you would usually run Wilcoxon test with Bonferroni correction (in this case p-value of.002?), but there are 24 points…. What would you do?

Kind regards,

Sam

I have a question maybe you could help.

I’ve conducted an outcome study using four groups, and want to analyze the data using a one-way ANOVA. I note that the variances for my four groups are: 5.4, 2.3, 1.5 and 2.1. Based on a quick check, have I violated any assumptions?

the answer is No

I need to figure out how to explain why it has not violated any assumptions

Thank you for your time, I hope you can help me, I have spent so much time trying to understand the concept. I stumbled on this site and thought I would ask.

Sean,

Clearly the variances are not equal, but they are not wildly different either. You are probably at the borderline of acceptance of the homogeneity of variances assumption. You should conduct Levene’s test to make sure. See the following webpage for more information about testing this assumption:

Homogeneity of Variances

You haven’t provided any information about the other ANOVA assumptions, and so I can’t comment on these. For more information see:

ANOVA Assumptions

Charles

Thank for the intellectual exposure. I am currently working on the five metallic sample where am to know the effect of heat on the properties on the sample. With one of the samples being the control, can I use anova to compare the value of the Properties of the metals and how?

ANOVA might be suitable, but it is hard to say for sure without a more complete description of the scenario. It sounds like you have multiple properties of the metal and so MANOVA might be a better choice.

Charles

Hello Charles,

Thanks for your helpful program.

However, I have a problem when I want to do a test such as ANOVA with Levene option or any other tests, it says that there is a compile error in an hidden file….

Do you have a solution for this ?

Noz,

The usual reason for this is that the addin wasn’t installed properly. These instructions are listed on the the following webpage:

Download Resource Pack (or if using Excel 2007 or Excel for the Mac, click on the appropriate page).

In particular, you need to make sure that Excel’s Solver is installed before you try to install Real Statistics Resource Pack.

To check to see which addins are installed press Alt-TI and see which have a check mark next to them.

Charles

Charles,

Thanks for your answer.

I checked and Excel solder is installed. I’ve tried with another computer where it’s work without any problems.

On my computer, when I want to install it as you said, it closes excel and say that there is a serious problem if it is installed in appdata. When I put it in Microsoft/office12/library/analyses I can open the tests window but it says error in an hidden module.

Do you know why ?

Noz

Sorry, but I don’t know why this is happening. If you look at the comments to the following webpage, you see that various ideas have been proposed. One is the need to update to SP3 when running Excel 2007.

Real Statistics Resource Pack for Excel 2007

Charles

Good Morning Sir,

quick question: How Do I choose my c-value during Games-Howell post-hoc?

I’ve got 4 Groups of n=5 (1=17, 2=17, 3=10, 4=10).

Do I just look it up in “the Critical Values of Studentized Range Distribution(q)”-Chart? And would it be the same for every row in your RealStats Excel-Sheet?

Cheers,

Martin

Martin,

It is much simpler than that. Just insert +1 and -1 (the contrasts) in the c column corresponding to the two groups you want to compare. You can compare any two groups. You can also change the rows where you put the +1 and -1 if you want to make multiple pairwise comparisons.

Charles

Sir when we apply tool ANOVA then how many Questionnaire should be filled up from the respondents for measuring brand equity?

See ANOVA Sample Size webpage.

Charles

Hello Carles:

Can I have a non parametric test equivalent to two way Anova? are they a powerful tool?

Felix

Felix,

See the following webpage

Scheier-Ray-Hare

The test is not very powerful.

Charles

Thank you very much sir. I got an idea about the selection of variables.please sir could you reply me whether its correct or wrong with my selection of variables for the analysis. I took between the age and level of satisfaction towards schemes as my study is about it

Sorry, but I am unable to answer your question based on the limited amount of information that you have supplied.

Charles

Sir I am using excel format for my analysis. I am not having clear idea to select the variables for one way anova. let us do ANOVA analysis between frequency of visit of the respondent and income per month of the respondent. please do reply me sir. Am awaiting for your answer

The example you gave seems more like a regression problem than an ANOVA problem.

Charle

Thank you very much sir.but could you please provide me some examples for one way ANOVA. How to select the 2 variables. What Kind of variables should be taken for analysis.

There are some examples on the referenced webpage and others throughout the website. One-way ANOVA has one independent variable (called a factor) which takes categorical values and one dependent variables which takes continuous values. E.g. for Example 1 on the webpage http://www.real-statistics.com/one-way-analysis-of-variance-anova/basic-concepts-anova/, the independent variable takes values for the three flavors and the dependent variable takes the score values 13, 12, 7, etc.

Charles

Dear Sir,

I followed your suggestions in doing ANOVA and simply wish to clarify a doubt. My ANOVA result is significant at p<0.01 but a couple of pairs in post-hoc are insignificant. This won't challenge my finding that the variable influences the error count, will it?

Meera.

Meera,

The significant result from ANOVA means that at least two of the variables have unequal population means. It does not mean that all the pairs of variables have significantly different means. Thus, the fact that some of pairs in post-hoc are insignificant, does not challenge your finding that the means of all the variance are not equal.

Charles

kindly sent how to calculate likert scale average in excel

Sorry, but I don’t understand what you mean by a “likert scale average”. There is some risk in taking the mean of likert scale data, since you don’t necessarily know the distance between the sclae elements. If you can determine what these distances are, then you would take the weighted average. Most people simply take the simple average, and ignore this issue.

Charles

Researchers desire a reliable predictor for Juvenile Idiopathic Arthritis flare ups. It is suspected that levels of phagocyte activation marker myeloid related proteins 8 and 14 hetercomplex (MRP8/14) are good markers. Known mean MRP8/14 in patients with JRA is 500 ng/ml with standard deviation of 200 ng/ml. A difference of 100 ng/ml is considered clinically significant. What sample size (per group) of patients with JRA is necessary to compare MRP8/14 measures between groups of with and without flare ups if the t test is to be used? Assume MRP8/14 is normally distributed in the population. (Foell D, et al, Methotrexate withdrawl at 6 vs 12 months in Juvenile Idiopathic Arthritis in Remission. … Could u break this down?

Jiro,

Sample size for t tests: Please see the webpage Sample size requirements for t test. Also see Real Statistics Power Data Analysis Tool.

“Could u break this down?”: Sorry, but I don’t understand your question.

Charles

Dear Sir,

I tried doing the post-hoc test but got no significant result. I’m confused. My ANOVA result was significant at p <0.01

What to do?

Meera

Meera,

Which post-hoc did you perform? Most likely the problem is that you need to fill in the highlighted range with contrast values as described on the webpages

Planned Comparisons and Unplanned Comparisons.

Charles

Dear Sir,

I went for Scheffe’s. Not using the Resource Pack though. When I tried it in the Resource Pack it said “Compile error in hidden module: frmAnova1”

Meera

Meera,

That is not good. What version of Windows and Excel are you using? Are you able to use other Real Statistics Resource Pack capabilities?

Depending on what you are trying to demonstrate, Scheffe’s is usually not the best post-hoc test to use. Usually Tukey HSD gives better results.

Charles

I will Sir. Thank you for helping me out.

Meera

Dear Sir,

I didn’t quite get this part “If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.”

I got significant results at p<0.01.

Meera.

Meera,

If for example you had four groups, the significant result from the ANOVA test tells you that there is a significant differences among the means of the four groups, but it doesn’t tell you which groups(s) have different means. If you want to better understand this then you need to conduct some follow up test. See the website for more details and examples about this.

Charles

Dear Sir,

Is it mandatory to do a post hoc test after ANOVA? I need to prove that the variable ‘annual income’ influences error count. Can I simply do an ANOVA and leave it at that?

Meera,

If ANOVA gives you sufficient information for the test you are trying to make then you can leave it at that. Particularly if you get a non-significant result then you will typically want to leave it at that. If, however, you get a significant result, then usually you will want to better pinpoint what is causing the non-significant result, which is where the post-hoc tests come in to play.

Charles

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

I appreciate, cause I discovered exactly what I was looking for.

You’ve ended my 4 day lengthy hunt! God Bless you man. Have a nice day.

Bye

Hi Charles:

Thank you very much for this website. I have been benefited from you website in a number of occasions.

I have a question about the ANOVA test:

1. Does it necessary to have the whole population to do Anova or we can as well use the average values of the population to do Anova.

2. I have several p-values from a number of Anova tests. What is the possibility of combining all these p-values to come up with one p-value. Is there any way of averaging the p-value for one system.

Thanks very much.

Subhu,

1. There is no point in running an ANOVA if you have access to the whole population’s data. You can just look at descriptive statistics on the population. If by population, you mean sample, then I am not sure what average values you are referring to. Perhaps a more concrete example would be helpful in understanding what you are trying to accomplish.

2. I can’t see any benefit in averaging p-values. What is it that you are trying to accomplish?

Charles

Thank you for your reply Charles.

I will explain in a bit detail of what I am trying to do:

1. I have a reactor and I am recording velocity data at different cross-sections.

2. I have three heights where I am taking these datas.

3. I am measuring the velocity using different instruments. I want to compare the difference between these instruments based on the time-averaged velocity values.

So, now here are my questions:

1. Should I take the instantaneous values or time-averaged velocity values for doing ANOVA.

2. By doing Anova at three different heights, I will have 3 p- values. Is a way to combine these 3 p-values to get just one, which will represent the whole system.

I hope that I conveyed the problem to you. Thanks very much in advance.

Subhu

Subhu,

As always the answer to your questions depend on what you are trying to prove/test. It sounds like you have three factors (i.e. independent variables): cross-section, height, instrument. Velocity is the dependent variable. If you want to understand the interactions between these factors then you probably should use a 3 factor ANOVA (instead of a one-factor ANOVA). You may also have a fourth factor, namely time, although this may be equivalent to the cross-section factor.

Question 1: Assuming time and cross-section are equivalent, and you don’t care about differences at the cross-section level, then you could use time-averaged velocity; otherwise you would need to use velocity at each cross-section. It is really up to you and to what you are trying to study. Generally it is best to keep all the detail, but at some point (certainly at the fourth factor level) too much data makes any analysis too complicated.

Question 2: If you make height a factor, then only one p-value is created for the height factor instead of 3 p-values.

Charles

You have simply the best material for teaching statistics. Thanks a lot for producing this!