Two Factor ANOVA with Replication

Introduction

In Two Factor ANOVA without Replication, we consider the analysis where there is only one sample item for each combination of factor A and B levels. On this webpage, we extend this analysis to the case where there are multiple samples for each such combination. Thus, in addition to the main effects corresponding to A and B, we now study the interactions between A and B, which is the main reason for performing this type of analysis.

We will restrict ourselves to the case where all the samples are equal in size (balanced model). In Unbalanced Factorial ANOVA we show how to perform the analysis where the samples are not equal (unbalanced model) via regression. 

You should not confuse ANOVA with replication with ANOVA with repeated measures as described in ANOVA with Repeated Measures.

Example introduced

As usual, we start with an example. We then provide some background information and then complete the analysis for the example.

Example 1: Repeat the analysis from Example 1 of Two Factor ANOVA without Replication, but this time with the data shown in Figure 1 where each combination of blend and crop has a sample of size 5.

Data ANOVA without replication

Figure 1 – Data for Example 1

Structural Model

Definition 1: We extend the structural model of Definition 1 of Two Factor ANOVA without Replication as follows.

In Definition 1 of Two Factor ANOVA without Replication the r × c table contains the entries {xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c}. We extend these tables to contain entries {Xij: 1 ≤ i ≤ r, 1 ≤ j ≤ c},  where Xij is a sample for level i of factor A and level j of factor B. Here Xij = {xijk: 1 ≤ k ≤ nij}. For now, we assume the nij are all equal of size m.

We use terms such as i (or i.) as an abbreviation for the mean of {xijk: 1 ≤ j  ≤ c, 1 ≤ k ≤ m}. We also use terms such as j (or .j) as an abbreviation for the mean of {xijk: 1 ≤ i ≤ r, 1 ≤ k ≤ m}.

As in Definition 1 of Two Factor ANOVA without Replication, we define the effects αi and βj where

image1360

Similarly, we define ai and bj where

image1363

We use δij for the effect of level i of factor A with level j of factor B, i.e. the interaction of level i of factor A and level j of factor B. Thus, δij = μij – μi – μj + μ. Similarly, we have

image1366

It is easy to show that
image1367

Finally, we can represent each element in the sample as

image1368

where εijk denotes the error (or unexplained) amount. As before we have the sample version

image1370

where eijk is the counterpart to εijk in the sample. Note that

image1373

and so
image1374

Also,
image1375

Null Hypotheses

As in Definition 1 of Two Factor ANOVA without Replication, the null hypotheses for the main effects are:

H0:  μ1. = μ2. = … = μr. (Factor A)

H0:  μ.2 = μ.2 = … = μ.c (Factor B)

These are equivalent to:

H0: αi = 0 for all i (Factor A)

H0βj = 0 for all j (Factor B)

In addition, there is a null hypothesis for the effects due to the interaction between factors A and B.

H0: δij = 0 for all i, j

More about the structural model

Definition 2: Using the terminology of Definition 1, define

ANOVA with replication formulas

We can also define the following entities:

image5062

Since the within groups terms are used as the error terms in our model, we also use the following symbols:

image1391

Properties

Property 1:

image1392

image1393

Proof: Clearly

image1394

If we square both sides of the equation, sum over i, j, and k, and then simplify (with various terms equal to zero as in the proof of Property 2 of Basic Concepts for ANOVA), we get the first result. For the second,

image1396

Property 2: Note that the between-group terms are as for the one-way ANOVA, namely

image1397

The proof is similar to the proof of Property 1. It also follows that

image1398

image1399

Property 3: If a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed and with all \sigma_j^2 (or \sigma_i^2 ) equal, then

image1401 image1402

Proof: The proof is similar to that of Property 1 of Basic Concepts for ANOVA.

Theorem 1: Suppose a sample is made as described in Definitions 1 and 2, with the xijk independently and normally distributed.

If all μi are equal and all \sigma^2_{i} are equal then

image1403

If all μj are equal and all \sigma^2_{j} are equal then

image1072

Also, under certain circumstances,

image1404

Proof: The result follows from Property 3 and Theorem 1 of  F Distribution.

Property 4:

image1405 image1406

Statistical Tests

We use the following tests:

ANOVA with replication tests

Assumptions

The assumptions for Two Factor ANOVA are similar to those for One Factor ANOVA, namely

  • All samples are drawn from normally distributed populations
  • The samples are drawn from populations that have a common variance
  • All samples are drawn independently from each other
  • Within each sample, the observations are sampled randomly and independently of each other

By sample, here we mean each combination of levels from the two factors.  We also want to make sure there are no outliers that can distort the results of the test. See ANOVA Assumptions for how we check these assumptions using the Real Statistics Resource Pack.

Example continued

We now return to Example 1 and show how to conduct the required analysis using Excel’s Anova: Two-factor With Replication data analysis tool.

Example 1 (continued): The summary output from the data analysis tool is given on the right side of Figure 2, with the sample data repeated on the left side of the figure.

ANOVA replication Excel tool

Figure 2 – Summary output of ANOVA data analysis for Example 1

The top part of Figure 3 contains the rest of the output from the data analysis tool. We’ll explain the bottom part momentarily.

ANOVA replication Excel analysis

Figure 3 – ANOVA analysis for Example 1

We now draw some conclusions from the ANOVA table in Figure 3. Since the p-value (crops) = .0649 > .05 = α, we can’t reject the Factor B null hypothesis, and so conclude (with 95% confidence) that there are no significant differences between the effectiveness of the fertilizer for the different crops.

Since the p-value (blends) = .00025 < .05 = α, we reject the Factor A null hypothesis and conclude that the blends are statistically different.

Interaction Plots

We also see that the p-value (interactions) = .0456 < .05 = α, and so conclude there are significant differences in the interaction between crop and blend. We can look more carefully at the interactions by plotting the mean interactions between the levels of the two factors (see Figure 4). Lines that are roughly parallel are indications of the lack of interaction, while lines that are not roughly parallel indicate interaction.

From the first chart we can see that Blend Y has quite a different pattern from the other brands, especially since the line for Blend Y is trending down towards Soy and up towards Rice, exactly the opposite of Blend X and Z). We also see that Blend X is trending up towards Soy much more abruptly than Blend Z.

Interaction ANOVA plot Excel

Figure 4 – Interaction plots for Example 1

Worksheet Functions

Although the analysis in Figures 2 and 3 was produced automatically by Excel’s data analysis tool, the same result can be produced using Excel formulas, just as we were able to do for Example 1 of Two Factor ANOVA without Replication. In fact, all the entries in the ANOVA table in Figure 3 can be calculated using the tables constructed in the bottom part of Figure 3 in exactly the same way as was done in Example 1 of Two Factor ANOVA without Replication.

In fact, the only thing new is the calculation of the error term SSW. To calculate it we must first construct the table of the square deviations for all the interactions from their mean. This table appears in cells J38:N41 of Figure 3. E.g. the entry for SSWheat,BrandX (in cell K39) is =DEVSQ(B5:B9). SSW is then calculated as the sum of all the terms in the table, namely =SUM(K39:N41).

Alternatively, we can use Property 2 to calculate SSBet and then use the fact that SSW = SST SSBet. To calculate SSBet we first construct the table of the means of the various interactions of factors A and B (range J43:N46 of Figure 3), as described below. SSBet is now calculated using the formula =DEVSQ(K44:N46)*H5. For Example 1, SSBet = 18420.5, and so SSW = SST SSBet = 39640.9 – 18420.5 = 21220.4.

Example using row formatting

Example 2: Repeat the analysis for the data in Example 1 by using the presentation of the data given in the table on the left of Figure 5.

Alternative presentation ANOVA data

Figure 5 – Alternative presentation of data in Example 1

Excel’s ANOVA data analysis tools don’t support data in this format, and so we must proceed to create the ANOVA table (i.e. the output found in Figure 3) using the formulas. This is straightforward, although tedious, with the result presented in Figure 6. As usual, the hardest part is the calculations for the SS terms, which are shown on the right side of the worksheet in Figure 6.

Two factor ANOVA replication

Figure 6 – ANOVA output for Example 2

When the assumptions are not met

In general, when the assumptions are violated, transformations and non-parametric (rank) tests are not very useful for two-way ANOVA. We can instead abandon the omnibus test and apply the various planned and unplanned tests described in Planned Comparisons for ANOVA and Unplanned Comparisons for ANOVA by treating the two-way ANOVA as a one-way ANOVA.

In particular, when the variances are not equal we can apply Welch’s correction for contrasts. We can also use the Scheirer-Ray-Hare test or Aligned Rank Transform (ART) ANOVA

Reference

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

169 thoughts on “Two Factor ANOVA with Replication”

  1. Hi Charles,

    The post says “By sample, here we mean each combination of levels from the two factors”.

    But when it comes to Two Factor ANOVA without Replication, how can “sample” be defined and how can we check the assumptions with only one subject in each combination of levels?

    Reply
  2. Dear Professor,

    Is this not 2 one- way ANOVAs with the first ANOVA done with the data table being read upright and the second with the data table read crosswise (sideways)?
    Curious.

    Ray

    Reply
  3. Hello, I am a little confused about paragraph before Figure 4. because it aapears …especially since the line for Blend Y is trending up towards Soy and down towards Rice, exactly the opposite of Blend X and Z)…
    However, it can be seen that the trend is the opposite: Blend Y is trending down towards Soy and up towards Rice. Am I right or is my interpretation wrong?

    Reply
    • Hello Juan,
      The statement should be “…line for Blend Y is trending down towards Soy and up towards Rice, exactly the opposite of Blend X and Z)…”
      I have changed the webpage to fix this error. Thanks for your comment, which made me aware of the incorrect and confusing statement.
      Charles

      Reply
  4. Dear Charles,
    I designed an experiment in order to study differences in ovarian follicular population related to age of women. So, we have 3 ages (young, adult and old) as independent variable and 3 follicular sizes (small, medium and big) as dependent variable. In addition, the model must include the effect of body condition ranked of 1 (very thin) to 5 (obese) and days post-partum, as covariate.
    I’m not sure which statistics of the Real Statistics package I should use. Can you help me?
    Thanks.

    Reply
    • Hello Rafael,
      It depends on what hypothesis or hypotheses you want to test.
      If I understand correctly, you have two factors: Age (3 levels) and Body Condition (5 levels). Your dependent variable seems to take 3 ordered values (0,1,2). You might be able to use ordinal regression, but it all depends on what you are trying test.
      Charles

      Reply
  5. Wonderful work! Very instructive!

    Only one question about your last observation : do the methods for multiple comparisons need to meet the normality assumption?

    If it is necessary, I think we cannot use these methods when the normality assumption is violated.

    Reply
    • Hello Xi,
      Glad you are getting value from the website.
      Yes. It is assumed that multiple comparisons were done after a significant ANOVA result. ANOVA requires normality.
      These tests are pretty robust to violations of normality, and so it depends on how far from normality you are.
      Charles

      Reply
  6. Good morning Charles,
    Is this the case when I have a Randomized Complete Block Design with 5 replicates?
    (i.e.: the five yield values for each fertilizer/crop combination come from replicates in the same experimental design?)

    Thanks a lot for your support with the awesome website!
    Guido

    Reply
  7. It might be useful to make clear at the top of this post that “with replication” and “repeated measures” are not the same thing. This is especially important because the 2-factor ANOVA with replication function in Excel appears to perform that function well for multiple independent observations per cell, but not when, for example, a group of subjects is tested under multiple conditions were each person serves as his or her own control. Excel does not seem to be able to correctly perform a 2-factor ANOVA with repeated measures.

    Excel seems to perform 1-factor ANOVA with repeated measures satisfactorily, although as you point out, the procedure is misnamed.

    Reply
  8. Hi, I am using a two way anova to look at changes in blood pressure over time. I have the same issue with the p value showing #NUM!.

    Reply
  9. Hi I’m using two way anova for difference in infiltration rates across three plots before and after human trampling. I have three infiltration values before trampling and three infiltration values after trampling but when i calculate the anova #NUM ! appears in the P-values and F crit boxes, could you please help? Thank you.
    The data i’m using is: Plot 1: 8.5 and 0.7, Plot 2: 2.6 and 0.4 and Plot 3; 2.5 and 2.1

    Reply
    • Anand,
      As explained in the paragraph right after Figure 5, I didn’t use the Excel ANOVA tool (and so the number of rows per sample is not relevant). Instead, I used certain formulas.
      Charles

      Reply
  10. Thanks for nice information…….I would like to tell you that I have two years of above like data, then how i make combined analysis for two years i.e. Two factor lab data of two years. Please suggest me.

    Reply

Leave a Reply to Charles Cancel reply