# Friedman Test

The Friedman test is a non-parametric alternative to ANOVA with repeated measures. No normality assumption is required. The test is similar to the Kruskal-Wallis Test. We will use the terminology from Kruskal-Wallis Test and Two Factor ANOVA without Replication.

Property 1: Define the test statistic

where k = the number of groups (treatments), m = the number of subjects, Rj is the sum of the ranks for the jth group. If the null hypothesis that the sum of the ranks of the groups are the same, then

when k ≥ 5 or m > 15. The null hypothesis is rejected when Q > $\chi^2_{crit}$.

Example 1: A winery wanted to find out whether people preferred red, white or rosé wines. They invited 12 people to taste one red, one white and one rose’ wine with the order of tasting chosen at random and a suitable interval between tastings. Each person was asked to evaluate each wine with the scores tabulated in the table on the left side of Figure 1.

Figure 1 – Friedman’s test for Example 1

The ranks of the scores for each person were then calculated and the Friedman statistic Q was calculated to be 1.79 using the above formula. Since p-value = CHITEST(1.79, 2) = 0.408 > .05 = α, we conclude there is no significant difference between the three types of wines.

Observation: Just as for the Kruskal Wallis test, an alternative expression for Q is given by

where $SS'_{Col}$ is the sum of squares between groups using the ranks instead of raw data.

For Example 1, we can obtain $SS'_{Col}$ from the ranked scores (i.e. range F3:I15) using Excel’s Anova: Two-Factor Without Replication data analysis tool (see Figure 2), and then use this value to calculate Q as described above.

Figure 2 – Alternative way of calculating Friedman’s statistic

Real Statistics Excel Functions: The Real Statistics Resource Pack contains the following functions:

FRIEDMAN(R1) = value of Q on the data (without headings) contained in range R1 (organized by columns).

FrTEST(R1) = p-value of the Friedman’s test on the data (without headings) contained in range R1 (organized by columns).

For Example 1, FRIEDMAN(B5:D14) = 1.79 and FrTEST(B5:D14) = .408.

### 34 Responses to Friedman Test

1. Roger Bakeman says:

This was very helpful. Even more helpful were your comments for the Wilcoxon signed ranks test because it gave me the information I needed to calculate the 95% confidence interval for T and to calculate the effect size r. Increasingly, journal editors are asking for these. Could you provide information on how to compute a stander error for H (as provided for T) and the effect size r for the Friedman test?

• Charles says:

Roger,
I don’t know of any commonly accepted values for the standard error or effect size for Friedman’s test, although Kendall’s W is often cited as an effect size for Friedman’s H. Here W = H/(m(k-1)) where k = the number of groups (treatments) and m = the number of subjects. Also used as an effect size is the r coefficient for Kendall’s W, which is r = (mW-1)/(m-1). In fact it can be shown that r is the average Spearman correlation coefficient computed on the ranks of all pairs of raters.
Charles

2. Jiahui says:

Dear Mr. Zaiontz,

I am writing to ask which data analysis method is appropriate for non-parametric 2 within-subject factors ANOVA.

My dependent variable is error rate (0-1), which does not apply normal distribution. I transformed the DV data, and conducted the repeated measures ANOVA. I am interested in the results based on non-parametric methods too. But I did not found a proper way yet.

Thanks!

Jiahui

• Charles says:

Jiahui,
Friedman’s test is a way of conducting a non-parametric repeated measures ANOVA.
Charles

• Jiahui says:

I know Fiedman’s test is the non-parametric alternative to the one-way ANOVA with repeated measures. Can I use Friedman’s test for two-within factors ANOVA? In case I am interested in the non-parametric two-way repeated ANOVA, which method I can use?

• Charles says:

I don’t know of any nonparametric method for two-way repeated ANOVA.
Charles

• Kevin says:

Hi Charles,

There is actually a test for a nonparametric 2 way ANOVA with replication/repeated measures, called the Schierer Ray Hare test. Essentially, you replace the original data with their respective overall ranking, regardless of the level of either factor that they occupy.

You then perform a standard 2 way ANOVA on the ranked data, but you take things a step further by dividing the total sum of squares by the total degrees of freedom to get a “total mean square.” You then divide the individual sums of squares for the two factors and the interaction term (not the mean sums as you did to determine the F ratios) by this “total mean square” to yield an H ratio for the two factors and the interaction.

Provided each group has at least 5 elements, these H ratios can be evaluated against the chi-square distribution with the relevant degrees of freedom from the two way ANOVA. If either CS value for the individual factors is significant, then the groups differ in that respective factor; likewise if the interaction H is significant, there is a significant interaction between the factors for the groups in question.

• Charles says:

Kevin,
Thanks for your very helpful comment. I will add the Schierer Ray Hare test to the next release of the Real Statistics software.
Charles

3. Umer says:

Dear Mr. Zaiontz,
Your article was really helpful. I just want to confirm the use of Friedman’s test for analysis of data collected through 5-scale likert-type questions. Basically, we have 5 factors and we want to find which one of these are more effective. We have gathered data using 5-scale Likert-type questions with ordinal ranking from 1 to 5 (ranging from Strongly Disagree to Strongly Agree). Can we get a sort of ranking of these five factors according to their effectiveness using this test?
Thank you!

• Charles says:

You don’t need to use Friedman’s test to get a ranking of the five factors. Simple arithmetic is sufficient. What Friedman’s test will tell you is whether any differences between the mean rankings are statistically significant (or are just due to random effects).
Charles

• Umer says:

Thank you very much Mr. Zaiontz!
So that means that i can find the difference (significant or insignificant) between 4 or 5 groups of data? Basically, it is for research thesis and we cannot simply deduce something based on the ordinal 5-scale. We intent to use t-test for comparing pairs of data set but it will be a good thing if we could test all 5 of them together?

• Charles says:

Friedman’s Test tests all 4 (or 5) groups together to determine whether the 4 groups statistically have the same mean (the null hypothesis). If the answer is yes, then you are done. If the answer is no (i.e. at least 2 of the groups have different means), then you would typically perform a t test (or Mann-Whitney test) to determine which two groups are the ones with different means (or some other similar test).
Charles

4. Noemi says:

Dear Mr. Zaiontz,

I have a very specific problem (surprisingly similar with the one posted in the comments above) and am a bit confused on which statistical test to use. I am sorry if I bother you but I do not understand what you mean with simple arithmetic?
I have investigated the effect of 5 different treatments and used a scale between 1-5 to evaluate the effect. I repeated the investigation 6 times. My data is therefor ordinal (ranked), non-parametric and not normally distirbuted.
However, I also calculated the means of each treatment over time of the 6 runs.
I’m not sure if I have to compare the means of the 5 Treatments or do not use the means at all and make a Friedman test for each treatment of the six runs. I would be very thankful for any Information on how to proceed and what statistical test to use. (I was thinking about Man-Whitney U or Friedman’s test?)
Thank you!

• Charles says:

Noemi,

I need the some additional information before I can answer your question. First of all I need to better understand the experiment. In particular,

1. Are all 5 treatments applied to each of the 6 people in the sample?
2. Are the 5 treatments applied to different people (for a total of 30 people in the study)?
3. Is the sample divided into 5 groups, one for each treatment, where each person in the sample gets one treatment over 6 different time intervals?

Before answering the question about which statistical test you should use, I need to understand what hypothesis are you trying to test.

Charles

5. Lucas D. Mazza says:

To check the p-value use formula: CHISQ.DIST.RT (1.79,2)

• Charles says:

Lucas,
Yes, that is correct.
Charles

6. Katerina says:

Dear Charles Zaiontz,

I am using Friedman test to compare three different conditions. I have 33 participants who took part in each condition.

My question is can I use Friedman test when the three groups are of unequal size? e.g. group 1 12 particip., groups 2 21 particip. etc.

Katerina

• Charles says:

Katerina,
Friedman’s test would be used when you have 33 participants who experienced all three conditions. In this case the sample sizes would all be the same, namely 33.
If you are instead looking at three independent groups, the participants in each group experience one and only one condition, then you want to use fixed factor ANOVA, Kruskal-Wallis or some other similar test.
Charles

7. Andy says:

Hi Charles,

This is one of best explanations I have came across while wanting to know a bit more about Friedman statistic… Thank you!

I have a couple of questions on the method and conclusion:
1. Can I use this method as a product tradeoff analysis like conjoint analysis and having no levels?
As an example, lets say I ask people to share preferences of disruptive technologies. I give them $100 to distribute in the list. I then run Friedman test on collected samples to determine whether difference in choices were significant. 2. You mention – as p value > 0.05 we conclude there is no significant difference between the three types of wines. >> What if p was < 0.05 and in this case does the Ranking order matter? How/When do I definitely say people like first Red then White and then Rose? Andy. • Charles says: Andy, I am pleased that you found my explanation of Friedman’s test useful. The following are some observations about your two questions. 1. I am not sure why you saw “no levels”. Suppose that people have three disruptive technologies to choose from. They assign their$100 among the three technologies. My initial thought is that you perform Friedman’s test (or repeated measures ANOVA if the assumptions are satisfied) to determine whether there is a significant difference between the three technologies. Offhand I don’t see any problem with this approach, but I frankly haven’t enough time to think through this response in sufficient detail to make sure that this is so.

2. You can order the average preferences as you have indicated, but just because White has a higher ranking than Red in the sample, doesn’t mean that this represents a significant difference (with respect to the population). You would need to perform a post-hoc test to determine this. I have provided tests for post-hoc tests for ANOVA. I am now adding some for Kruskal-Wallis and soon I will look at follow-up testing for Friedman’s test.

Charles

8. SIM says:

Hi Charles,

Thank you for your tutorial, i am new to statistic. I would like to know how can i determined or select from the table based on my amount of subjects. ie N=5, N=6.
From the table of critical values for three conditions, how should we determined which P values to choose from?

Thanks!

• Charles says:

Sorry, but I would have to see the table that you are using since I am not providing such a table from my site.
Charles

9. Kevin says:

Hi Charles,

I’m glad that I could be of service for the next release regarding the SRH test! It is relatively unknown, but can be rather effective in the right environment, especially with non-normal data! I forgot to mention this in my original post, but the SRH test only works if the ANOVA is balanced. So there needs to be an equal number of observations for each level of each factor…of course, this will not usually be a problem for repeated measures designs unless one or more of the subjects are unable to complete the testing. You may have figured this out already, but I thought it would be best to clarify this restriction. Thanks again for running a great site!

10. Ron_N says:

Thank you for the insightful articles you provide. I have a project to deliver and i would request you to suggest me which is the best method/ test to conduct my study.

I have a 6 month data set with 13 students(student 1, student 2, etc) and 11 tests(test1 – test 11) and the average time taken by each student to complete each test over 6 months. The data is not normal.

I want to find out, for each test-wise, which students are underperforming or overperforming.

I am proving a rough sample. Please suggest.

test 1 test 2 test3 test 4 ———– test 11
student 1 45.56 8.56 0.56 50.56
student 2 23.25 9.65 8.25 32.65
student 3 22.32 24.56 16.32 24.56
student 4 15.12 33.26 25.12 44.26
student 5 80.32 45.00 15.32 45.00

student 13

• Charles says:

Ron,
It really depends on how you define under- or over-performing. You can look at this as a problem of identifying outliers. The website has several webpages that address this issue. Please use Search to find these pages.
Charles

• Ron_N says:

Hi Charles,

I am thinking of applying IMR control charts for subgroup 1 (each individual is considered as one subgroup). Then i would target those individuals who would fall below or above the control limits.
I am confused with one thing. Could you please guide me whether i should consider 1 sigma, 2 sigma or 3 sigma as my control limits?

Thank you for this wonderful site. I have gathered so much knowledge from here.

Regards,
Ron

• Charles says:

Ron,
Good to hear that you are getting value from the site.
There is no clear rule regarding 1 sigma, 2 sigma or 3 sigma. I generally use 2.5 sigma, except for very large samples where I may choose a larger value like 3.
Charles

• Ron_N says:

Thank you Charles for all your support. Now i can carry out my analysis.

11. Eveline says:

Dear Mr. Zaiontz
Let me first thank you for your explanations. I’ve got a question regarding post hoc tests after a Friedman test. In my study N=301 subjects were asked to rate 10 items about their relevance (how important is item 1: “very important”, “rather important” etc.; item 2: “very important”, “rather important” etc., …). My final aim is to get a ranking list of these 10 items, i.e. which one is regarded as the most important, the second-most etc.
I did a Friedman test and got a significant result. I assume that I’m not allowed to take the Mean Ranks and “arrange” the ranking list, i.e. take the one with the lowest Mean Rank, then the one with the second-lowest and so forth … so how do I get a ranking list? Do I have to do Wilcoxon tests for EVERY pair of items (that would be 45 comparisons)? Or would it be enough to compare each item with the one above and the one below according to the Mean Ranks?
It might be a bit of a dumb question, but I’m really confused right now.
Thank you in advance and greetings from Switzerland,
Eveline

• Charles says:

Eveline,

If what you want is a ranking of the 10 items, I can’t think of any reason why you couldn’t just order the items based on their total (or mean) ranks. The only problem with this, is if say item C has a lower rank than item E but the difference is not statistically significant. Presumably, the Friedman’s test just told you that at least two of the items have a significant difference in mean rank, which is obviously not that helpful.

One approach to post hoc testing is Wilcoxon’s signed ranks test on all 45 pairs, using a Bonferroni correction. Based on such post hoc tests with just three items A, B and C, you could find that A and B are the only pairs that are significantly different (say A < B), which presumably would mean that C is between and A and B, although not significantly different from either one of them. Charles PS: From Wikipedia, Post-hoc tests [to Friedman’s test] were proposed by Schaich and Hamerle (1984)[1] as well as Conover (1971, 1980)[2] in order to decide which groups are significantly different from each other, based upon the mean rank differences of the groups. These procedures are detailed in Bortz, Lienert and Boehnke (2000, pp. 275).[3] Not all statistical packages support Post-hoc analysis for Friedman’s test, but user-contributed code exists that provides these facilities (for example in SPSS [1], and in R [2]) References 1. Schaich, E. & Hamerle, A. (1984). Verteilungsfreie statistische Prüfverfahren. Berlin: Springer. ISBN 3-540-13776-9. 2. Conover, W. J. (1971, 1980). Practical nonparametric statistics. New York: Wiley. ISBN 0-471-16851-3. 3. Bortz, J., Lienert, G. & Boehnke, K. (2000). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. ISBN 3-540-67590-6.

12. Daniel says:

Dear Professor
First, thank you very much.
I have a similar case of repeted measures but I dont have subjects and I have treatments with differents repetitions each one per each measure time. How can I work with these type of data?
Dani

• Charles says:

You would need to use a different type of model, based on regression or linear mixed models. I will be exploring these sorts of models shortly.
Charles

13. Ciacco says:

Why 12, why is there a 12 in the equation – where is 12 coming from?
Thank you

• Charles says:

The 12 is just the result of some mathematical calculation. You would have to look at the mathematical derivation of the Q statistic.
Charles