One within subjects factor

Some examples of this type analysis are:

  • A study is made of 10 subjects each of whom is asked to take a test on reading comprehension, mathematical ability and knowledge of history
  • A study is made of 10 monkeys, each of whom is given training once a week for 5 weeks with their score recorded each week
  • A study is made of 10 married couples, and the husband’s IQ is compared with his wife’s

The important characteristic of each of these examples is that the treatments are not independent of each other. The most common of these analyses is to compare the results of some treatment given to the same participant over a period of time (like the second example above).

For the version of ANOVA with repeated measures with one within-subjects factor, we can use Excel’s Two Factor ANOVA without Replication data analysis tool. Essentially the following meanings are given to the terms in Definition 2 of Two Factor ANOVA without Replication: MSRow = MSA and MSCol = MSB and similarly for the df and SS forms. The column terms (representing the within-subjects factor) are the ones that are of interest. The row terms represent the subjects.

Since the same subject is involved, the different treatments are not independent of each other. This results in an additional assumption called sphericity, which is described in Sphericity.

As usual we start with an example.

Example 1: A program has been developed to reduce the levels of stress for working women. In order to determine whether the program is successful a sample of 15 women was selected and their level of stress was measured (low scores indicate higher levels of stress) before the program, as well as 1, 2 and 3 weeks after the beginning of the program. Based on the data in Figure 1 (range G5:K20) determine whether the program is effective in reducing stress.

Data one within subjects factor

Figure 1 – Data for Example 1

We use Excel’s Anova: Two-Factor Without Replication data analysis tool (Figure 2) to carry out the analysis.

Figure 2 – Output from Anova: Two Factor Without Replication

Figure 2 – Output from Anova: Two Factor Without Replication

For this problem we aren’t interested in the analysis of the rows, only the columns, which correspond to variations by time. Since the test statistic F = 29.13 > 2.83 = F-crit (or p-value = 2.4E-10 < .05 = α), we reject the null hypothesis, and conclude there are significant difference between the means.

As usual, we can do further analysis to discover where the differences are, and so in this way determine whether the program is effective. These correspond to the planned and unplanned comparison tests for one-way ANOVA.

Example 2: Compare treatment means before and after the program for the data in Example 1. Determine whether the program is effective and determine the effect size.

We use the same approach as for independent treatments, except that we can only use MSE in calculating the standard error when the sphericity requirement is met and the contrast involves all the treatment levels. When these requirements are met then using MSE yields more power.

Since the sphericity requirement is not met for Example 2 (see Sphericity for details), it is better to calculate the standard deviation in a fashion similar to that for paired data as described in Paired Sample t-Test. We use the contrast weights (1, -1/3, -1/3, -1/3) for Example 1 to compute the equivalent of the differences between the paired data, and then compute the mean, ignoring the sign, and standard deviation of these differences. The standard error of the means is then the standard deviation divided by \sqrt{n}. This analysis is presented in Figure 3.

Contrast ANOVA repeated measures

Figure 3 – t test on contrast for Example 2

Some representative formulas in Figure 3 are given in Figure 4.

ANOVA matched samples formulas

Figure 4 – Representative formulas for Figure 3

Since p < .05, we conclude that there is a significant difference between stress before and after the program, and so the program appears to be effective.

The effect size d for this contrast (cell F26 in Figure 3) is

The effect size r for this contrast is

Effect size r

Observation: We can summarize the results of this analysis as follows:

To investigate the effects of a stress reduction program therapy for working women, the stress levels of 15 participants were taken before the program and then 1, 2 and 3 weeks after the start of the program. The overall variance for repeated measures showed a significant difference between weeks (F(3, 42) = 29.13, p < .05). The mean level of stress before the program was 12.53, which increased to a mean of 22.73 3 weeks after the start of the program (higher measures indicate lower levels of stress), a difference of 10.20. A contrast on this difference was significant (t(14) = -6.65, p < .05). Using the standard deviation of contrast differences for each participant yielded an effect size of d = 1.17, showing the importance of the program in treating stress.

Observation: As usual, if you don’t conduct the omnibus ANOVA test, you can run α – 1 planned orthogonal contrasts without adjusting alpha. It is also commonly acceptable to run α – 1 planned contrasts even if they aren’t all orthogonal. If you conduct both the omnibus ANOVA and planned contrasts then you will need to adjust alpha using the Bonferroni or Dunn-Sidák correction as described in Planned Comparisons for ANOVA.

For pairwise unplanned tests contrasts can be used adjusting alpha using the Bonferroni or Dunn-Sidák correction, or the Tukey HSD (for pairwise comparisons) or Scheffé test (for compound comparisons).

For Tukey HSD use
Tukey HSD repeated measures

where s.e. is the pairwise standard error (as was done for Example 2) and not \sqrt{MS_E/n}. The critical value of the studentized range q is based on the values of α, a (the number of treatments) and df = n – 1 .

For Scheffé test, again don’t use MSE to compute the standard error and instead of using dfB * FINV(α, dfB, dfW) as the critical value, use (a – 1) * FINV(α, a – 1, n – 1).

30 Responses to One within subjects factor

  1. Louis Peperzak says:

    Dear Charles, Row 27 in Figure 3 appears to be missing. I wondered how you did your calculations and noticed this in Figure 4. Thank you very much for your help. Kind regards,
    Louis Peperzak

  2. Louis Peperzak says:

    Thanks for the quick reply. What I meant was that in Fig 4 some calculations refer to row 27 (e.g. t score=A27/B27) where it should read row 26 (=A26/B26). Louis

  3. Pingback: Use of In-Class Theater to Enhance ELL Literacy | Melody Warren Education Blog

  4. Stefano says:

    We measured blood pressure to some subjects at 3 time points (before they took a drug, during effect and later on). I carried out a 1-way repeated-measures Anova (using the two factors without replication tool, as suggested), tested sphericity (GG epsilon = 0.77, HF epsilon = 0.99). Which post hoc test should I carry out? Just paired t-tests reducing my alpha with bonferroni or sidak corrections?

    Thank you in advance.

    • Charles says:

      Hello Stefano,
      Which tests to use depends as always on what you are trying to prove. If you are only interested in comparing “before” with “after” you can simply perform a t-test with paired samples. If you want to compare all three combinations you can either use a Bonferroni or Sidak correction or perform Tukey HSD test. If you are interested in more complicated comparisons you may choose to use Contrasts. These options are described in Example 2 of the referenced webpage and the observations which follow the example.

  5. Surabhi Nayak says:

    The following data represent the typical results from a delayed discounting study. The participants are asked how much they would take today instead of waiting for a specific delay period to receive $1000. Each participant responds to all 5 of the delay periods. Use a repeated-measures ANOVA with α = .01 to determine whether there are significant differences among the 5 delay periods for the following data:
    participant 1 month 6months 1 year 2 years 5 years
    A 950 850 800 700 550
    B 800 800 750 700 600
    C 850 750 650 600 500
    D 750 700 700 650 550
    E 950 900 850 800 650
    F 900 900 850 750 650


    • Charles says:

      Your problem is very similar to Example 1 of the referenced webpage, and so you can follow the approach shown for that example. Since you need to take sphericity into account, for completeness, see also Example 2 of the Sphericity webpage. To make things similar, you can use the Real Statistics One Factor Repeated Measures Anova data analysis tool, as described on the Sphericity webpage.

  6. Gary says:

    Spelling error just before the equation in Example 2 under the second “Observation” above…Turkey should be Tukey! Darn spell checker!

    • Charles says:

      Thanks for catching this. Turkey is clearly not the same as Tukey. I just corrected the webpage. Thanks for catching this error.

  7. Thomas says:

    There is a little mistake in the Cohen d calculation. You switched the digits. It should say 1.71

    • Charles says:

      Thanks for finding this error. I have now changed the website with the correct value. I appreciate your help in improving the website.

  8. Anna says:

    Hi Charles,

    Thanks so much for all the advice. Can you tell me how you arrived at the alpha value for the contrasts tests (it appears just above the p-value label in the contrasts tests table). I modified your contrast table to generate output for multiple planned contrasts, but I’m not confident I made the right choice when I compared all of the contrasts to that alpha value. Is it already adjusted for the number of measures? Do you recommend I change the weight to reflect the number of planned contrasts? I currently have 3 within subjects levels, and I’m running the following contrasts
    1 -1 0
    -1 0 1
    0 1 -1

    Thanks in advance for your help!

    • Charles says:


      In general, you should divide the alpha value (say .05) by the number of contrasts that you run (this is the Bonferroni correction). In the data analysis tool, this defaults to the number of groups plus one. Thus, with 3 groups, the value list for alpha should be .05/4 = .0125. Actually there is a programming error and I should actually be using the number of groups minus one. The reason for this is that I want to use as the default the number of orthogonal contrasts.

      In any case, you can override the default value of alpha to whatever value you desire. If you have 3 contrasts you should use alpha = .05/3 = .01666.

      Thanks for having me look at this so that I could find the error in the default value. I will change this in the next software release.


  9. Piero says:

    Dear Charles,

    First of all, thank you very much for your precious website and excel tools!

    I am performing a repeated measures analysis in which I want to test the efficacy of a clinical treatment on a group of patients. So all the subjects are measured two times, before and after receiving the treatment.
    As I read in your tutorial, sphericity is not an issue when only two groups of measures are involved.
    Anyway, I verified that the variance of measurements perfomed on the sample in the pre-treatment condition is much higher than the variance of measurements in the post-treatment conditions. In effect, if I would check homogeneity of variances between the two groups with Levene’s test (I know this is not correct in this case, it’s just to make an example), the result would be a strong heteroscedasciticity.

    My question is: how should I consider this large difference in variances? Does it affects in some way the results of my repeated measures test?

    Thank you very much for your help!
    Best Regards

    • Charles says:

      Levene’s test is valid also in the case where there are only two groups of measurements, but generally with two paired groups (before vs. after), you are interested in the paired differences, and so there is only one variance (of the differences) and not two.
      With only two pairs, you can use the paired t test and don’t even need to bother with a repeated measures ANOVA.

  10. Sri says:

    I want to analyze the relationship between air pollutant levels between the inital week where no plants were introduced to the first, second, third, and fourth week where plants were introduced. I measured the data three times during the day, one a week. So my data was inputted as the columns being: the initial, week 1, week 2, week 3, and week 4 and then my rows being: 8am, 11am, and 2pm. I then conducted an Anova: Two-Factor Without Replication test on the data. Would both the rows and column data be important in this study or just the columns?
    Thank you in advance.

    • Charles says:

      I have a few questions for you to help address your question:
      1. Are all the air pollution measurements taken at the same place?
      2. Do you care about differences in air pollution levels at different times of the day?
      3. Do you care about differences in air pollution levels after one, two, three and four weeks after the plants were introduced?

  11. Nick Michalak says:

    Hi Charles

    I love your website and your software!

    For the Scheffe calculation, what are “a” and “n” in:
    (a – 1) * FINV(α, a – 1, n – 1)?

    Is a = number of groups? If so, would a – 1 = 0 in a one-way repeated measures contrast?

    Is n the total sample size or a group sample size?

    • Charles says:

      Glad you like the website and software.
      I don’t know where you see the formula (a – 1) * FINV(α, a – 1, n – 1), but in any case, please look at the following webpage:
      Unplanned Comparisons

  12. Jay says:

    Dear Sir, I like your website so much.
    My question is …
    I want to test the image quality database.
    I have images name I1, I2, I3,….I20 and it quality is varying along the three levels 0 to 10 variation mixed [for example, low (near 1, say), fair and good (towards 10, say) ]
    The number of subject who is rating all the images in the scale of 1 to 10 is 50. so my data size is 20×50 (images x subject). All this data is in continuous.
    My question is whether can I use ANOVA with Repeated Measures Within Subjects? If yes, then how? if No then which measure I should use?
    Please explain me the steps.

    • Charles says:

      You might be able to use ANOVA with Repeated Measures, but which test to use really depends on what you are trying to accomplish. What is your objective? What hypothesis are you trying to prove or gain eveidence for (or against)? Are you trying to get a measure of agreement between the raters?

      • Jay says:

        Thank you so much for the reply sir.
        Yes sir, I want to get a measure of agreement between the raters? or the association between the raters.
        2) I have mean of the observation and I want to check the relation between each rater and the mean of the observation of all the rater.
        Thanks in advanced.

  13. NR says:


    I have a question. Say we are determining the number of plate cells affected by certain antibiotic dose i.e. Dose escalation. We are running 5 different antibiotic dosages (5 numerically set dosages, all inserted into plates at the same time, 90 minutes later, all cells affected will be recorded). The experiment is run 3 times, providing 3 count results per dosage.

    1. I would like to first compare the difference between the results of dosages i.e. Significanct difference between dosages. Would I just use a randomized complete block ANOVA test?

    2. Secondly I would like determine standard error/or error involved in reproducibility of each run at a single dose (meaning at dose 0.1, I would like to compare the number of cells affected from the runs called count 1, 2, and 3). Which equations would you recommend?

    3. Lastly, let’s assume this experiment is run ANOTHER two times (each time involving the same 5 standard dosages, run 3 times as previous) . Should I calculate the mean of cells affected at each dosage? Meaning in exp1 at dosage 0.1, should I calculate the cell response mean (mean of count 1,2,3 at dosage 0.1), and in exp 2 with dosage 0.1 also find the mean of cell response (count 1,2,3) and similarly for experiment 3? And then compare the means Exp1Mean0.1, Exp2Mean0.1, Exp3Mean0.1 in order to determine the variability between the three, if a overall confidence interval per dosage, the final cumulative error value associate per dose? Should I use FOLLOW UP ANAYSIS Two Factor ANOVA. Or the simple effect of Single ANOVA?

    Getting a bit confused, and could really use your help!

    Thank you sir!

  14. Teddy says:

    Hi Charles
    I have 3 groups with unequal size assessed at baseline and after 3 months (same subjects assessed at 2 time points). There’s a clear difference for one group at the second time point (I want to show that the subjects react differently at the second time point). if I run a 2-way ANOVA with subjects as repeated measure (or not) the time is significant but the interaction is not so I can’t compare all cells?! If i run a 1 way ANOVA at baseline there’s nothing significant but 1 way ANOVA at the second time is significant due to one group. If I run paired 2-test it’s also significant for one group. If the 3 groups are similar at baseline (1 way ANOVA) then can I just run paired t-test or shall I use another statistics?

    • Charles says:

      When you say that you “want to show that the subjects react differently at the second time point”, do you want to show this (1) for each of the three groups or (2) for all the groups together. If (2), then simply run a one-way repeated measures ANOVA and if there is a significant result, use one of the follow up tests described on the website. You can run three separate paired t-tests, but this will increase your experiment-wise error from .05 to (1-.05)^3 = .14 (which is not great, or alternatively, you will need to reduce your alpha value to .017 with the loss of statistical power).
      If (1), then you should use the testing approach described on the referenced webpage.

  15. TiffanyC says:

    Dear Charles,
    I have 7 groups of different sizes (5-8), and measurements taken at baseline and four other time points. I want to see 1) if different groups respond differently at a particular time point and 2) if the response within a group differs over time. I am not sure whether I should use two-way ANOVA (time v.s. group) or repeated measures one-way ANOVA for statistical comparisons. What would you recommend? Thanks a lot!

    • Charles says:

      If the measurements at the baseline and 4 different times are taken on the same subject, then you should use repeated measures ANOVA. Since your subjects are divided into 7 different groups, you need the two-way version, what I have also called repeated measures ANOVA with one within subjects factor and one between subjects factor.

Leave a Reply

Your email address will not be published. Required fields are marked *