Two Factor ANOVA with Replication

In Two Factor ANOVA without Replication there was only one sample item for each combination of factor A levels and factor B levels.

We will restrict ourselves to the case where all the samples are equal in size (balanced model). In Unbalanced Factorial ANOVA we show how to perform the analysis where the samples are not equal (unbalanced model) via regression. As usual, we start with an example.

Example 1: Repeat the analysis from Example 1 of Two Factor ANOVA without Replication, but this time with the data shown in Figure 1 where each combination of blend and crop has a sample of size 5.

Data ANOVA without replication

Figure 1 – Data for Example 1

Definition 1: We extend the structural model of Definition 1 of Two Factor ANOVA without Replication as follows.

In Definition 1 of Two Factor ANOVA without Replication the r × c table contains the entries {xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c}. We extend these tables to contain entries {Xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c},  where Xij is a sample for level i of factor A and level j of factor B. Here Xij = {xijk: 1 ≤ k ≤ nij}. For now we assume the nij are all equal of size m.

We use terms such as i (or i.) as an abbreviation for the mean of {xijk: 1 ≤ j  ≤ c, 1 ≤ k ≤ m}. We also use terms such as j (or .j) as an abbreviation for the mean of {xijk: 1 ≤ i ≤ r, 1 ≤ k ≤ m}.

As in Definition 1 of Two Factor ANOVA without Replication, we define the effects αi and βj where


Similarly, we define ai and bj where


We use δij for the effect of level i of factor A with level j of factor B, i.e. the interaction of level i of factor A and level j of factor B. Thus, δij = μij – μi – μj + μ. Similarly, we have


It is easy to show that

Finally, we can represent each element in the sample as


where εijk denotes the error (or unexplained) amount. As before we have the sample version


where eijk is the counterpart to εijk in the sample. Note that


And so


As in Definition 1 of Two Factor ANOVA without Replication, the null hypotheses for the main effects are:

H0:  μ1. = μ2. = … = μr. (Factor A)

H0:  μ.2 = μ.2 = … = μ.c (Factor B)

These are equivalent to:

H0: αi = 0 for all i (Factor A)

H0βj = 0 for all j (Factor B)

In addition there is a null hypothesis for the effects due to interaction between factors A and B.

H0: δij = 0 for all i, j

Definition 2: Using the terminology of Definition 1, define

ANOVA with replication formulas

We can also define the following entities:


Since the within groups terms are used as the error terms in our model, we also use the following symbols:


Property 1:



Proof: Clearly


If we square both sides of the equation, sum over i, j and k and then simplify (with various terms equal to zero as in the proof of Property 2 of Basic Concepts for ANOVA), we get the first result. For the second,


Property 2: Note that the between group terms are as for the one-way ANOVA, namely


The proof is similar to the proof of Property 1. It also follows that

image1398 image1399

Property 3: If a sample is made as described in Definition 1 and 2, with the xijk independently and normally distributed and with all \sigma_j^2 (or \sigma_i^2) equal, then

image1401 image1402

Proof: The proof is similar to that of Property 1 of of Basic Concepts for ANOVA.

Theorem 1: Suppose a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed.

If all μi are equal and all \sigma^2_{i} are equal then


If all μj are equal and all \sigma^2_{j} are equal then


Also, under certain circumstances,


Proof: The result follows from Property 3 and Theorem 1 of  F Distribution.

Property 4:

image1405 image1406

Observation: We use the following tests:

ANOVA with replication tests

Recall that the assumptions for using these tests are:

  • All samples are drawn from normally distributed populations
  • All populations have a common variance
  • All samples are drawn independently from each other
  • Within each sample, the observations are sampled randomly and independently of each other

We now return to Example 1 and show how to conduct the required analysis using Excel’s Anova: Two-factor With Replication data analysis tool.

Example 1 (continued): The summary output from the data analysis tool is given on the right side of Figure 2, with the sample data repeated on the left side of the figure.

ANOVA replication Excel tool

Figure 2 – Summary output of ANOVA data analysis for Example 1

The top part of Figure 3 contains the rest of the output from the data analysis tool. We’ll explain the bottom part momentarily.

ANOVA replication Excel analysis

Figure 3 – ANOVA analysis for Example 1

We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

Since the p-value (blends) = .00025 < .05 = α, we reject the Factor A null hypothesis, and conclude that the blends are statistically different.

We also see that the p-value (interactions) = .0456 < .05 = α, and so conclude there are significant differences in the interaction between crop and blend. We can look more carefully at the interactions by plotting the mean interactions between the levels of the two factors (see Figure 4). Lines that are roughly parallel are indications of the lack of interaction, while lines that are not roughly parallel indicate interaction.

From the first chart we can see that Blend Y has a quite a different pattern from the other brands, especially since the line for Blend Y is trending up towards Soy and down towards Rice, exactly the opposite of Blend X and Z). We also see that the Blend X is trending up towards Soy much more abruptly than Blend Z.

Interaction ANOVA plot Excel

Figure 4 – Interaction plots for Example 1

Observation: Although the analysis in Figure 2 and 3 was produced automatically by Excel’s data analysis tool, the same result can be produced using Excel formulas, just as we were able to do for Example 1 of Two Factor ANOVA without Replication. In fact all the entries in the ANOVA table in Figure 3 can be calculated using the tables constructed in the bottom part of Figure 3 in exactly the same way as was done in Example 1 of Two Factor ANOVA without Replication.

In fact the only thing new is the calculation of the error term SSW. To calculate it we must first construct the table of the square deviations for all the interactions from their mean. This table appears in cells J38:N41 of Figure 3. E.g. the entry for SSWheat,BrandX (in cell K39) is =DEVSQ(B5:B9). SSW is then calculated as the sum of all the terms in the table, namely =SUM(K39:N41).

Alternatively we can use Property 2 to calculate SSBet and then use the fact that SSW = SST SSBet. To calculate SSBet we first construct the table of the means of the various interactions of factors A and B (range J43:N46 of Figure 3), as described below. SSBet is now calculated using the formula =DEVSQ(K44:N46)*H5. For Example 1, SSBet = 18420.5, and so SSW = SST SSBet = 39640.9 – 18420.5 = 21220.4.

Example 2: Repeat the analysis for the data in Example 1 by using the presentation of the data given in the table on the left of Figure 5.

Alternative presentation ANOVA data

Figure 5 – Alternative presentation of data in Example 1

Excel’s ANOVA data analysis tools don’t support data in this format, and so we must proceed to create the ANOVA table (i.e. the output found in Figure 3) using the formulas. This is straightforward, although tedious, with the result presented in Figure 6. As usual the hardest part are the calculations for the SS terms, which are as indicated on the right side of the worksheet in Figure 6.

Two factor ANOVA replication

Figure 6 – ANOVA output for Example 2

Observation: In general when the assumptions are violated, transformations and non-parametric (rank) tests are not very useful for two-way ANOVA. We can instead abandon the omnibus test and apply the various planned and unplanned tests described in Planned Comparisons for ANOVA and Unplanned Comparisons for ANOVA by treating the two-way ANOVA as a one-way ANOVA.

In particular, when the variances are not equal we can apply the Welch’s correction for contrasts. We can also apply a modified version of the Brown & Forsythe F* test (click here for further information).

106 Responses to Two Factor ANOVA with Replication

  1. Colin says:

    You wrote: “Also, under certain circumstances, MSab/MSw ~ F(dFab, dFw).”
    Could you tell me under what circumstances?

  2. Roman says:


    In this example we have three independent factors (Blends X, Y and Z) and four dependent continuous variables (rice, soy, wheat, corn) which we analyze with ANOVA. Will analyzing this data with MANOVA make any sense? If yes, what will be the difference?

  3. heather says:

    when m=1, “with replication” reduces to “without replication”. The SS_AB term becomes identical to SS_E, and SS_W goes to zero, as it should be.

    Conceptually, shouldn’t the interaction between A and B exists regardless the value of m? Why does one call SS_AB an interaction, with the SS_W as an error term, when m>1; but SS_E (reduced from SS_AB when m=1) is called an error term? Should the interaction between A and B be considered an error only when m=1?

    If the error term represents the “unexplained amount”, is the interaction term “explained” when m>1, but becomes “unexplained” when m=1?

    Thank you so much!

    • Charles says:

      The SS terms measure variability. In the without replication case, since there is only one data element in the intersection between A and B levels there is no variability and so SS_AB = 0. The error term is SS_W = SS_T – (SS_A + SS_B), which turns out to have the same formula as the SS_AB term in ANOVA with replication, but as I mentioned above since SS_W is not 0 it is not the SS of the interaction between A and B in the without replication case.

      • heather says:

        Thank you Charles. I was confused by the x_ij_bar notation in the SS_E term for the “w/o replication” case in the table under definition 2. It should be just x_ij, without the bar. It was obvious in the proof below.

        The Excel ANOVA table seems to label the terms arbitrarily, your website helps a lot in clarifying it. Thank you!


        • Charles says:

          Thanks for catching this error. I have made a note on the webpage that this needs to be corrected.

  4. sahar says:

    I was wondering if you could help. I’m looking to run a 2×2 anova … time (pre/post) x group (intervention/control) for 4 variables (various dependent variables. Is this considered a 2×2 anova , 2-factor w/out replication? i want to see if my variables changed between the group over pre/post-intervention. thank you!

    • Charles says:

      If you only had one dependent variable, then this sounds like 2 x 2 ANOVA with one fixed factor (group) and one repeated measures factor (time).
      If there is no (or little) correlation between the dependent variable, you can run four separate ANOVAs. The more leikely situation is that there is correlation, in which case you will likely want to use MANOVA.

  5. Mike Daly says:

    I have a trial comparing 7 fertiliser treatments on a Maize crop with 4 replicates (randomised block). I have 2 components to analyze. Total cob wt, and mean cob wt.

    Which anova programme do I use?

    • Charles says:

      If for example the replicates are say four varieties of maize, then you have a two fixed factor ANOVA with a fixed fertilizer factor and a fixed maize variety factor design. This is based on the fact that the mean cob wt can be calculated from the total cob wt. If not then you would have two dependent variables and so you would need to use two factor MANOVA (note that Real Statistics software only supports one factor MANOVA at present).

      Also this assumes that there are only 4 varieties of maize under study; if these 4 varieties were randomly chosen say from 100 possible varieties, then this would be a random factor and not a fixed factor.


      • Mike Daly says:

        Thanks for that.
        Then maize crop is the same variety across the trial.
        So what I have is, 7 fertilizer treatments replicated in 4 blocks.

        So a randomised block design with 7 fertiliser treatments.
        7*4=28plots, 7 plots per replicate
        So what programme do I use?
        cheers Mike

        • Charles says:

          If I understand correctly, you have 28 plots of land and 7 fixed fertilizer treatments. For each fertilizer you randomly select four plots of land and apply that fertilizer (each plot of land gets one fertilizer). If all you are interested in is the fertilizer, then you can simply use one-way ANOVA to compare the fertilizers. This assumes that all the plots of land are interchangeable (i.e. have similar characteristics). Please let me know whether this is your situation before I made any further suggestions.

          • Mike Daly says:

            Yes thats correct.
            The only thing I would add to that, is the replicates are in blocks, So its a ramdomised block design.
            So Block 1 has 7 fertilizers trts randomised (all rep 1). Then block 2 has the next 7 trts (rep 2)…etc., etc

            Does that make any difference ? Still use a one way anova?

          • Charles says:

            With randomized block design you should use two factor ANOVA without replication. This is precisely the data analysis tool supplied by Excel.

  6. Joe says:

    I will be conducting feeding trials on village chickens using a locally formulated layer diet and commercial layer feed as the control. the trial is to compare egg production from the two diets. I intend to use ANOVA statistical analysis to analyse the data. there will be 4 replications. What is the appropriate design for such trials and what analysis method using would be the correct one?

    • Charles says:

      What are the replications? By 4 replications do you mean 4 days, 4 chickens or something else?

  7. Opanin says:

    i neva knew excel was that useful…i only used it in entering data and always said in mind that i will have to get a better software for my analysis….uv really made sense out of it for me i will not be underrating excel from now onwards….

  8. Ishaq says:

    Hi, I want to investigate sex differences and education level on test anxiety among students. My variables are as follows :
    Independent variable 1 – Sex difference (male or female)
    Independent variable 2- education level (grade 2 or grade 3 students)
    Dependent variable – Test anxiety reported by the students.
    Is this suitable for a 2 way ANOVA? If yes, when putting in the data, should I input the score of each student on the Test Anxiety Inventory (TAI)? Or the sum of the students who reported test anxiety? I got confused on how I can key in each datum of upto 261 students that participated. Thanks.

    • Charles says:

      If text anxiety takes on a continuous set of values (or can be approximated by such values, e.g. 1, 2, 3, 4, 5, 6, 7), then this is indeed a 2 x 2 ANOVA, where you insert the test anxiety values in the four cells. Since 261 is not divisible by 4 this will be an unbalanced model. This can be solved by regression (see Unbalanced ANOVA).

      You insert the scores of each student in the cells. A possible source of confusion is with the chi-square test where you insert sums.


  9. satish jadhav says:

    i am studying impact of emotional intelligence on teaching with the variables like sex ,education , age, and managemaent type
    so how can i use it for co-realating E.I to teaching

  10. Mohammad says:

    Dear Dr Charles
    I am studying the effect of treatment, say X. I have three samples in each group and in each
    sample I obtained three readings before and after treatment X, my question is: Will ANOVA with replication will be the technique to seek answers by? if yes, then I should follow instructions in this page! below is data similar to what I have and mean.

    no treatment
    smaple 1 sample 2 sample 3
    55 54 65 33 44 43 22 33 43

    with treatment
    smaple 1 sample 2 sample 3
    56 52 61 33 45 41 33 34 41

    • Charles says:

      Dear Mohammad,

      Based on the data that you have presented, I understand the following about Sample 1. Please let me know whether this is correct.

      There are three subjects in Sample 1. Subject 1 got a score of 55 before treatment and a score of 56 after treatment. Subject 2 got a score of 54 before treatment and a score of 52 after treatment. Subject 3 got a score of 65 before treatment and a score of 61 after treatment.

      If this is the correct way to interpret the data (and presumably the interpretations for the other two samples is similar), then a two factor ANOVA with replication is not the correct test. Instead you need a two factor mixed ANOVA where one of the factors is repeated measures. This is described on the following webpage:


  11. Mohammad says:

    Thanks Dr Chalers. Yes, your interpretation is correct.

  12. Rolf says:

    Dear Dr Charles

    I study if a new method estimates the same score than the old method. I have 60 participants that get tested with both methods at 5 time points. Can I use Two Factor ANOVA with replication to determine if the methods get different results?

    Thank you
    Best regards

    • Charles says:

      No, you need to use ANOVA with repeated measures. In fact you need the mixed version of the test – one between and one within factor. This is described on the webpage


  13. Rob says:

    Hi Dr Charles, can I ask,
    I write a small manual for students but I have trouble with this ANOVA particularly.
    Beacuse ist for students of education, I will present example from this science. Can you please tell me if I understand it right or wrong.

    (fictitious example) I have two measurements: A) Before course and after course in:
    knowledge of: A) social B) ontogenetic and C) clinical psychology.

    From ANOVA with replication I should find out:
    p1: differences in knowledge before and afrer course (rows)
    p2: differences in knowledge between subjetcs (columns)
    p3: interaction? This is the second part of my misapprehension.

    Is this example right for this test? What interaction tells me about this factors?

    Thank you very much!

    • Charles says:

      Based on my understanding of the scenario, you probably want to use ANOVA with repeated measures and not ANOVA with replication. It looks like a mixed model with factor A repeated measures and factor B not repeated measures. You can learn more about this type of model at the webpage
      Mixed Repeated Measures ANOVA.

  14. aren says:

    I am a student trying to complete my thesis. I am stuck with which method i should use. i have 4 different treatments: Treatment1(T1) : culm cutting in raised nursery bed
    T2: branch cutting in raised bed
    T3: Culm cutting in flat bed
    T4: branch cutting in flat bed
    Could you please recommend? Thank you

    • Charles says:

      It really depends on what you are trying to test. If you are trying to determine whether the treatments yield the same of different results then you can use one-way ANOVA with the 4 treatments listed. If you also want to study the interaction effects, then use two factor ANOVA where one factor is the cutting type (branch vs culm) and the other factor is bed type (raised vs. flat).

  15. Olarewaju Blessing says:

    Please help will like you to exhaustively differentiation between Two way Anova with and without replication. Thanks in anticipation.

    • Charles says:

      In Two-way ANOVA there are two factors, which I will call factors A and B. Suppose factor A has m levels (also called groups or treatments) and factor B ha n levels. Thus there are m x n combinations of levels from the two factors. These are the interactions between the two factors.
      In Two-way ANOVA without replication, the sample for each of the m x n consists of just one element.
      In Two-way ANOVA with replication, the sample for each of the m x n consists of two or more elements.

  16. Elizabeth says:

    Hi Charles,
    I am looking for some statistical assistance. I have three groups (2 gene knockdowns and 1 negative control),where the assumption is there is no difference among them. Each time I have run the experiment, 30 technical replicates have been used for each group. I have run the experiment three times, giving me three biological replicates. I am wondering whether I should run an ANOVA with a two-tailed post-hoc Dunnett test against the negative control with repeated measures or replication?
    Thank you so much!

    • Charles says:

      If the three trials are based on the same 30 subjects per group, then it looks like you should use repeated measures (this will also be with replications). If the trials are on different subjects then depending on other details of the experiment you can simply run one-way ANOVA with 90 replicates per group.

      • Elizabeth says:

        Hi Charles,

        Thank you for your response. The trials for each are on a different set of 30 subjects per group. So I assume I should be using the one-way ANOVE with 90 replicates as you mentioned, but I am wondering whether that will give the data too much power and overestimate statistical significance?
        Thank you!


  17. arslan says:

    I am applying real stat add in on my data. I have two factors and two replications. One factor has four levels and other has two. When i apply two way anove, i get columns and rows but i did not have the interactions. Please help me out in this regard

  18. Shoaib Alam says:

    sir, i am an M.Sc Hons student, i analyzed my data while using two factorial design (two way ANOVA). i have 5 fertilizers, 5 species of sorghum and two replicates. one of my senior told me that i never use less than three replicates in two factorial design. sir please reply me what should i do?

    • Charles says:

      I don’t know of any such rule that you need at least 3 replicates. With such a small sample, the statistical power of your test will be very low.

  19. fatma says:

    Hi i’m working with one parameter with is protein content of 100 wheat genotypes cultivated in three growing seasons (season 1 (70 genotypes); season 2 (15 genotypes) and season 3 (15 genotypes)) ( with similar 14 genotypes between the 3 seasons) ; i did a combined analysis with genotype as fixed factor and crop year as random factor, results showed that genotype had the major impact , than G*CY interaction and finally crop year; what you think??

    • Charles says:

      It sounds like a reasonable approach, although I don’t have enough information to give you a definitive answer.

  20. Jhon77 says:

    Dear Dr Charles,
    I was greatly helped by the real stat,
    may I ask…
    based on Figure 4 – Interaction plots for Example 1;
    “From the first chart we can see that Brand X has a quite a different pattern from the other brands (especially regarding Soy). Although less dramatic, Brand Y is also different from Brand Z (especially since the line for Brand Y is trending up towards Soy, but trending down towards Rice, exactly the opposite of Brand Z).”

    Maybe you mean is “Blend” not “Brand”?
    and in these words: “Brand Y is trending up towards Soy, but trending down towards Rice”.
    It’s looks like mistyping to me. These: “trending down” become “trending up”, and vice versa.

    thank you

    • Charles says:

      Dear Jhon77,
      Thanks for catching these errors. I have just reworded the paragraph in error on the website. I really appreciate your finding these problems and your help in making the website better for the growing community of people who are using and depending on the site.

  21. Srikant Potluri says:

    Hi Charles,
    I am looking for some statistical assistance. I have three factors (NI, MOL & CO), Each factor contains 3 levels(2.5 WT%, 5 WT% AND 7.5 WT%). I am conducting experiments using L27 orthogonal array. What type of ANOVA I can use for finding the influence of each factor and also the influence of combination(NI*MOL, NI*CO, MOL*CO and NI*MOL*CO)
    Thank you so much!

  22. Breno says:

    Dear Charles,

    I’m stuck with my MA and would like your help, if possible. First a quick description of my study:

    There are 3 treatments (n= 13, n=13, n=12 participants) (each carrying out different vocabulary activities). Each treatment is tested twice on knowledge of the target words: immediately after the treatment and one week later. The tests for every treatment and both immediately and delayed are the same: 3 tests.

    I am planning to conduct a 3 x 2 ANOVA, 3 treatments (Tasks) being the between-subjects factor and time (immediate and delayed) being the within subjects factor. Since there are 3 DVs, 3 tests, I intend to conduct one 3 x 2 ANOVA for each DV.

    I don’t know which ANOVA to chose in your software. Aftre ctrl + M and choosing Anova: two factor, I don’t know if I should choose fixed or mixed, or even random. It’s also possible to choose repeated measures: one factor or repeated measures: random. I’m really at a loss for what to do. Please help

    My research questions basically ask whether the treatments will result in similar vocabulary learning and retention in each of the three tests. It does not want to investigate the differences between the DVs.

    Even so, I was suggested to carry out a 3 x 2 MANOVA (for the 3 DVs), but I’m really lost as to how to do it. Help?


    • Breno says:

      PS: there’s also two factor ANOVA via regression. I’m lost.


      • Charles says:

        You would generally use the regression approach to Two Factor ANOVA when the various groups are not equal in size.

        • Breno says:

          Thank you,

          Maybe I’m organising my data wrongly. In my case, the rows should be the between subjects factor (treatments 1 two and 3) and the columns should be the time factor (Immediate and delayed), right? The Greenhouse and Geisser and Huyhn and Feldt corrections in the repeated measures refer to the columns (and interaction) which should be the within-subjects factor. Is this correct?


          • Charles says:

            If you send me an Excel file containing your data, I can check whether it is organized correctly for a repeated measures ANOVA. Youi can get my email address by clicking on Contact Us.

    • Charles says:


      Because you are testing the same subject at different points in time you need a repeated measures test. You can find more information about this sort of test at
      Repeated Measures ANOVA

      Since you have multiple dependent variables then you should consider MANOVA instead of 3 separate ANOVA tests. Use of MANOVA for repeated measures tests can be found at
      Multivariate Repeated Measures Tests


      • Breno says:

        Thank you, Charles.

        In the example provided in the Multivariate Repeated Measures link, the two factors are age (Young, Middle, Old) and day (day1-5). It is exactly the same example as the two factor Anova (one between-subject and one between-subject factors). However, back to the multivariate repeated measures, below Figure 4, you describe Day 1 and Day 2 as two DVs. Why has it changed?

        In my experiment, treatments 1, 2 and 3 would be used instead of yours (young, middle, old) and immediate and delayed posttest would be used instead of the days. Is this correct? Where do I put the results from each of the three tests (DVs)?


        • Charles says:


          Paragraph 2: Figure 4 refers to the data for Example 2. This example is different from Example 1 since it only has two days. The goal of Example 2 is to show how to perform the analysis if the data is in stacked form.

          Paragraph 3: The mapping of your example into mine seems correct. I am not sure what you mean by “Where do I put the results from each of the three tests (DVs)?” You only have one test, not three.


      • Breno says:


        Thank you for your assistance. I’ve been trying to reply to your message but I keep getting info that it’s a duplicate, that I’ve already said that. Have you received my follow up question?


  23. Lois says:

    I’m just starting to learn stats.
    I need to prove that resolution affects time. what method/test will I use? thanks!

  24. jomar sambrano says:

    can you help me in solving statistical analysis?

  25. Ashley says:

    Hi, could you please help me on the sum of squares part, I did the steps as you have above but I’m not getting the right answer for my question. Also could you please explain how to get the p-value


    • Charles says:


      I suggest that first you make sure that you understand how to calculate the sum of squares and p-value in the one-way ANOVA case. The process is similar, but a little easier to understand. See
      One-way ANOVA

      You can also go to the Examples Workbook Part 2 to look at the formulas on the spreadsheets used in calculating the sum of squares and p-values. See
      Examples Workbooks


  26. Chris says:

    Dear Dr Charles,

    I have a scenario where in which I have a spreadsheet with 8 columns, across these 8 columns are 7 independent variables including discrete variables (for example I have Sale Week “Yes/No”) and continuous variables (such as temperature which is unique for each week at each store). The last column is a “Sales” column which shows the total sales for a specific store (1 of 6) on a specific week (1 of 6). I am tasked with finding the factors that effect sales. Obviously there are multiple factors that could effect it (such as temperature… whether it is a sale week… whether it is the store size etc.) so I need to test this, although can I use multiple ANOVA tests? Would this be at risk to a type 1 error?

    Please let me know if you need more information regarding the actual dataset, I tried to summarise the data briefly. However I should note that I have been specifically asked to use ANOVA and/or t-tests to analyse the data.



    • Charles says:

      Why don’t you use regression instead?

      • Chris says:

        I agree that regression would be more suitable, however, for the task I have been specifically asked to use ANOVA (or t-tests) to detect which factors affect sales.

        When I did a one-way ANOVA on temperature (I split the continuous data into low/med/high temperature), whilst there were significant differences on average sales between the groups of temperatures, it wouldn’t technically mean temperature had an effect on sales (because there are other independent factors in the data), would it? I’d have to find out if temperature had an interaction effect with another variable, but I’m not sure how to approach that?

        Thanks for your help.


        • Charles says:


          You said that you have 8 columns, which I understood represents 7 independent variables and the dependent variable Sales. You seem to have data for these variables for different stores in different weeks.

          For argument sake, suppose you want to look at the interaction between the temperature (low/medium/high) and some other variable, say training level (high/low). Further suppose that your sample consists of 60 stores and for each of the 6 combination of temperature and training there were exactly 10 stores. You could use a two-way ANOVA model (with replication) with temperature and training factors to model the interaction between temperature and training.

          If you have data for 4 weeks, you can perform the above analysis for any of the four weeks or the average of the four weeks.

          If in the above scenario the number of elements in each of the interaction is not equal (10 x 6 = 60 in the above), you would need to use an unbalanced ANOVA model.

          I hope this helps you.


  27. Alex says:

    How would this problem look like if it were done on a 5 step hypothesis?

    • Charles says:

      Sorry Alex, but I don’t know which problem you are referring to nor what 5 step hypothesis you are referring to.

  28. Roxane says:

    Dear Charles,

    I’m preparing for my Business Statistics exam coming up next week, and one of the practice questions was:
    Explain why, when a test is being done to check whether there is a significant interaction between two treatments, replications are needed.

    I don’t really understand this question, because the way I see it, replications are the fact that we have more than one observation in each cell, and you can still check for significant interaction without replication occurring… Plus this question is only worth 4 points out of 50, so I don’t think expect a very detailed answer.
    Anyways, it would be very kind of you if you could help me out with this!
    Have a great day!

  29. Estrella says:

    Dear Dr Charles
    I am studying the difference of X in 5 different nuclei of the brain (a1, a2, a3, a4, a5) in different time (control/pre/post). I have some animals of each group (3 controls, 3pre and 3 post). I know that I have to do a Two way ANOVA, but, If I do the same experiment in the same animal the measure is really different in almost all the nuclei and I don’t trust in doing the mean. So, I wonder if there is something I can do to avoid to do the mean.

    Thank you for your time.

    • Estrella says:

      Moreover, I would like to know how not to do the mean between the controls, pre or post. Because I want to compare them.

    • Charles says:

      Since you have pre and post times, you need to use Repeated Measures ANOVA. I can’t tell from your description whether you actually need a Repeated Measures Mixed ANOVA.

  30. Ash says:

    Hi Charles.
    Great website and thanks for answering queries here.
    My question is whether or not this type of ANOVA would be appropriate for a randomised complete block trial?
    The standard for a RCBT seems to be very similar to your example above but also includes degrees of freedom in the replication.

    Thanks, Ash.

    • Charles says:

      The approach does indeed use a randomized complete block design taking sphercity into account.
      I didn’t understand your comment about “degrees of freedom in the replication”.

      • Ash says:

        I have some raw data from a RCBD trial and have been asked to check the results of a third party who ran analysis on it.
        The trial had three replications which were run concurrently with each other. Testing two products at 4 different rates of application, to see if their effect was statistically different.
        Their method of analysis seems to have considered the degrees of freedom in replication, R.
        The table below shows the form that their results were presented in. I followed your method and did not consider degrees of freedom for R which yielded different results, notably DFerror = 14 below and 16 in your method.
        Am I applying an incorrect method?

        DF SS MS F P(F) LSD
        Total 23
        R 2
        A 1
        B 3
        AB 3
        ERROR 14

        • Charles says:

          I am not able to comment without additional information. If you send me an Excel file with your data and the results you obtained from R (please indicate which R capability you are using) and Excel, I will try to figure out what is going on. You can send this information to my email address listed at Contact Us.

  31. Douglas Baker says:

    Hello Charles, Thank you for this great site.
    I have a question how its best to analyze my data (Anova) for a whole experiment instead of independent data sets. Below is and example of what my data may look like.

    Crop X Crop X
    Product application:Treatment 1 Product application: Treatment 2
    Plant # Leaf 1 Leaf 2 Leaf 3 Plant # Leaf 1 Leaf 2 Leaf 3
    1 70 85 50 7 65 75 60
    2 71 86 51 8 66 76 61
    3 72 87 52 9 67 77 62
    4 73 88 53 10 68 78 63
    5 74 89 54 11 69 79 64
    6 75 90 55 12 70 80 65

    The data of each leaf is taken at different time points, for example Leaf 1 data maybe taken at day 18 only and Leaf 2 at 27 days only because at the time of single application the leaves are at different developmental stages therefore they need time to grow. Also I can’t do a average of a single plant’s measurement over all the leaves because they can vary greatly between leaves of the single plant however not between the plants (has to do with developmental stages). I am currently comparing means (one way anova) Treatment 1 and Treatment 2 for Leaf 1, and for Leaf 2 and 3 to have three independent data sets. I would like to compare the experiment as a whole to see the affect on the plant as a whole however not sure what would be the best way to do that.

    • Charles says:

      Sorry, but I don’t understand your scenario well enough to give any advice.
      Perhaps you can use Two Factor ANOVA or Split-plot ANOVA. Both are described on the website and are included in the Real Statistics software.

  32. Melvin says:

    Hi Dr. Charles, I’m now trying to analyze my thesis results. My study is about the control of diseases of eggplant grown in open field and in greenhouse, which is my mainplot, the two types of cultivation. My subplot includes six treatments including the control, with four replications, arranged in RCB layout. I looked into similar theses with the same experimental design as mine. It is similar to your annova in fig 6 except that it has one more source of variation, the replication. I’m confused now which annova will I use. Will it be best to use split plot annova, or RCB layout annova? Or is my study a special case which needs a different analysis?

  33. Andrew says:

    Hi Charles, this is a great teaching tool. I just switched to Excel from SPSS for teaching my stats classes because of your add-in, and so far its great. I have noticed a peculiar behavior in one of the factorial calculations and I was wondering if you prefer this kind of question posted here or sent privately?

    • Charles says:

      Glad to see that you are using the Real Statistics add-in for teaching purposes. This was one of my goals when developing the software.
      Generally, it is best to ask questions here (as a comment). If you need to include a spreadsheet, you can send it via an email at the address shown on Contact Us.

  34. Susan Luo says:

    Dear Dr. Charles,

    Thank you so much to provide us so great sources here!

    I have some problems with my data analysis, could you please help me?

    There are 3 pathologists reviewed 80 slides via 3 different systems and the time-taken for each reviewing was recorded as seconds. The slides they reviewed are the same. In another word, each slide was reviewed by each pathologist via each system. For each slide, 9 results got. Now I need to know if there are any differences of time-taken between different systems. That means if any system takes significantly less time to complete a reviewing.

    I tried Two Factor ANOVA with Replication according to your above instructions, and got 3 p-values (for pathologists, systems and interaction) much less than 0.05. Now what I am wondering is as follows:
    1. Did I choose the appropriate analysis for my data?
    2. If I still need to know which two systems are different, what I should do further?
    3. How to explain the interaction? I plotted two line charts, but still don’t know how to interpreter them.

    Thank you very much!

    Best regards


  35. Abdelkader says:

    Hi Dr,
    Please inform what mean m & c in formula is SSA. What mean r in formula SSB.
    Please give this values using the example you used.

  36. manoj kumar says:

    i have used Factorial RBD, with one factor at 3 level and factor two at five levels , total treatment 15 with 3 replication please send the analysis process in excel

    • Charles says:

      This process can be accomplished using Excel’s Two Factor Anova with Replication data analysis tool as described on the website. You can also use Real Statistics Two Factor Anova data analysis tool.

  37. NIDDAH CHISALE says:

    i have an experiment with two factors and five levels and am told that it should be replicated four times. how can i go about it?

  38. MANOJ B S says:

    sir above showed example two way ANOVA with replication of fertilizer vs crop is which type of model wheather it is LSD, split type model, …etc???.
    And can u suggest me the refference for above example

    • Charles says:

      I created the example. It is not based on a real study. The numbers are made up by me. The purpose of the example is to show you how to perform two factor Anova with replication.

  39. Vijay Rathod says:

    Dear Sir,
    I am confused by following statements below Figure 3:
    “Figure 3 – ANOVA analysis for Example 1

    We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor A null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

    Since the p-value (blends) = .00025 .05 = α, we can’t reject the Factor A null hypothesis,” I believe, instead of Factor A, it should be Factor B.
    Similarly, in sentence “Since the p-value (blends) = .00025 < .05 = α, we reject the Factor B null hypothesis" I believe, instead of Factor B, it should be Factor A.

    Whatever I believe is incorrect, please explain the conclusions.

    Thank You,
    Vijay Rathod

    • Charles says:

      Thanks for bringing this error to my attention. I have just corrected the webpage by interchanging A with B.
      I appreciate your help in making the website more accurate.

      • Vijay Rathod says:

        Dear Sir,
        In my mail of July 25, 2017, last sentence should have begun with” If ”. Sentence should have been “If whatever I believe is incorrect, please explain the conclusions.” I am sorry for whatever inconvenience it may have caused. I was wondering whether the mail has become meaningless. It is heartening to see, you understood it as it was intended. I am learning statistics with the help of your website.
        Vijay Rathod.

  40. Trang says:

    Could you discuss the credibility of the interpretations and conclusions after using two way ANOVA? and Is there anything we should be concerned about? for example, the violation of normality assumption.

    • Charles says:

      1. This is discussed on the referenced webpage (see the examples).
      2. The assumptions are described on the following webpage:

      • Trang says:

        Now I have a case needed to solve here:
        Suppose that a local chapter of sales professionals in the greater San Francisco area conducted a
        survey of its membership to study the relationship, if any, between the years of experience and
        salary for individuals employed in inside and outside sales positions. On the survey, respondents
        were asked to specify one of three levels of years of experience: low (1-10 years), medium (11-
        20 years), and high (21 or more years). The objective of this study is to test for any significant
        interaction between Position and Experience and to test for any significant differences in salary
        due to position and years of experience
        I wonder about the null hypotheses.
        There are 3 sets of hypotheses, are not there?
        H01: There is no differences in the mean salaries of sale person lying in different levels of years of experience.
        H02: There is no differences in the mean salaries of sale person lying in different levels of position.
        H03: There is no significant interaction between position and experience.

Leave a Reply

Your email address will not be published. Required fields are marked *