Friedman Test

The Friedman test is a non-parametric alternative to ANOVA with repeated measures. No normality assumption is required. The test is similar to the Kruskal-Wallis Test. We will use the terminology from Kruskal-Wallis Test and Two Factor ANOVA without Replication.

Property 1: Define the test statistic

where k = the number of groups (treatments), m = the number of subjects, Rj is the sum of the ranks for the jth group. If the null hypothesis that the sum of the ranks of the groups are the same, then

when k ≥ 5 or m > 15. The null hypothesis is rejected when Q > $\chi^2_{crit}$.

Example 1: A winery wanted to find out whether people preferred red, white or rosé wines. They invited 12 people to taste one red, one white and one rose’ wine with the order of tasting chosen at random and a suitable interval between tastings. Each person was asked to evaluate each wine with the scores tabulated in the table on the left side of Figure 1.

Figure 1 – Friedman’s test for Example 1

The ranks of the scores for each person were then calculated and the Friedman statistic Q was calculated to be 1.79 using the above formula. Since p-value = CHITEST(1.79, 2) = 0.408 > .05 = α, we conclude there is no significant difference between the three types of wines.

Observation: Just as for the Kruskal Wallis test, an alternative expression for Q is given by

where $SS'_{Col}$ is the sum of squares between groups using the ranks instead of raw data.

For Example 1, we can obtain $SS'_{Col}$ from the ranked scores (i.e. range F3:I15) using Excel’s Anova: Two-Factor Without Replication data analysis tool (see Figure 2), and then use this value to calculate Q as described above.

Figure 2 – Alternative way of calculating Friedman’s statistic

Real Statistics Excel Functions: The Real Statistics Resource Pack contains the following functions:

FRIEDMAN(R1) = value of Q on the data (without headings) contained in range R1 (organized by columns).

FrTEST(R1) = p-value of the Friedman’s test on the data (without headings) contained in range R1 (organized by columns).

For Example 1, FRIEDMAN(B5:D14) = 1.79 and FrTEST(B5:D14) = .408.

56 Responses to Friedman Test

1. John Smith says:

Hello Professor,

Thank you for the great work you have put into this site.

I have a question about the appropriate test to use: Friedman Test of the SRH test.

The data is as follows. I randomly divided 24 animals into 3 treatment groups (each animal received one of the three treatments). Each animal was evaluated for functional deficits on days 1, 2, 3, 4, 5, 6, and 7 post-injury. The functional deficits are scored as either 0, 1, 2, 3, 4, or 5. My data failed the normality test for each day. I want to know the appropriate statistical method to analyze the group effects as well as the time effect.

I believe I need to use a non-parametric, repeated-measures, two-way ANOVA for my data set. From my understanding, the Friedman test is not appropriate since it does not treat individuals as groups. Is the SRH test appropriate for my data set (i.e. does it consider repeated measures)? Thank you for all your insight.

• Charles says:

Hello John,
I don’t know of a non-parametric, repeated-measures two-way ANOVA. Perhaps it exists, but I am not aware of it.
How badly did your data fail the normality test? Did you try to use a transformation to normalize your data?
Another approach might be to analyze the group effects separately from the time effects, in which case standard tests could be used.
Charles

• John Smith says:

Hello Professor,

When I rank the data, it passes normality. So I presume that I can then use a repeated measures two-way ANOVA on the ranked data. I have tried a couple transformations, but not many. Thank you.

John

• Charles says:

John,
That is not a surprise since ranked data should be normal. The results will apply to the ranked data and not necessarily to the original data, but this approach may be useful and is the basis for many of the non-parametric approaches.
Charles

2. Joanna says:

Dear Dr. Zaiontz,
I would like to ask whether there is a difference between the result of the Friedman test and the Wilcoxon test if participants are assessed on two occasions. Can I use Friedman test in such a situation? I ask because I have found somewhere an information that the Friedman test can be used for 3 or more correlated samples.
Joanna

• Charles says:

Joanne,
Friedman’s Test is essentially the Wilcoxon’s Signed Ranked Test where there are 2 or more (repeated measures) groups. When there are two groups the results should be the same, but I suggest that you try using both methods and see for yourself.
Charles

3. Evonne says:

Hii..
May i ask some question regarding how to differentiate blocks and treatment in Friedman test? Izit always that blocks is row , and treatment is column?
Thanks…

• Charles says:

Evonne,
The treatments are the columns and the subjects are the rows. Each subject can be considered to be a block.
Charles

4. Laurie says:

Hello:
I am trying to download Real Stats for use with my Mac, using Excel 2008. It seems to have downloaded (it is listed as an Add-On) but I can’t seem to open (or find reference to) the application in Excel. Might there be a compatibility problem? I would like to be able to run a Friedman’s Test. I’ve never worked with add-ons in Excel before but this seems like a potentially very useful tool.

• Charles says:

Laurie,
Unfortunately, Excel 2008 does not support VBA, which is a requirement for any add-in, including Real Stats. You will need to use either the Windows version of Excel or Excel 2011 or Excel 2016 (Mac) to use Real Stats.
Charles

5. Christian Cabezas says:

Dear Dr. Zaiontz,
I have a sample of 100 people who ranked 14 elements based on their relevance. I have been looking for the right test to analyze if there exist a significant difference in the rankings. I have not found any other alternative that the Friedman tests and then post hoc (Wilcoxon’s signed ranks test on all 45 pairs, using a Bonferroni correction). However, i found in a blog that one assumption of Friedman test is that there should be repeated measures from the same participants (at least three). So, according to the blog, it is supposed to be used only for repeated measures. Do you think that I can still use this test for my specific research case? Thanks for your advice!

• Charles says:

Christian,

Friedman’s should be used with repeated measures. Since presumably each of the 100 people ranks each of the 14 elements, this is a repeated measures problem. Whether or not Friedman’s is the correct test depends on what exactly you are trying to test. If you want to know whether the rankings for the 14 elements are the same, then Friedman’s could be correct (or even possibly ANOVA with repeated measures). It also depends on what sort of values you are using for the rankings.

If you are trying to determine whether the 100 people agree in how they rank, then you should consider using ICC or Kendall’s W, depending again on how you do the rankings. These are covered on the website. Just insert the appropriate test in the Search box.

Charles

6. Joel says:

Hi – I wanted to check on validity of using Friedman’s test for a two factor anova – I have two types of implant materials that were evaluated histologically at 1 week and 4 weeks post-implantation. I believe that 1 factor would be the type of implant material and the 2-factor would be time – this is not repeated measures since samples for each type of implant material are different at each time point – please let me know if Friedman’s test is correct and if so – do I undertake post-hoc tests similar to a two factor ANOVA but using Dunn’s-KW-type tests?

• Charles says:

Hi Joel,
Friedman’s test is a substitute for one factor repeated measures ANOVA. From your description this doesn’t fit your requirements. From your description I understand that you have 4 samples: (1) implant material A at 1 week post-implementation, (2) implant material A at 4 weeks post-implementation, (3) implant material B at 1 week post-implementation and ($) implant material A at 4 weeks post-implementation. In this case, you would usually use 2 x 2 ANOVA, as described on the webpage http://www.real-statistics.com/two-way-anova/. You will also see information about follow-up tests. Charles • Joel says: thank you – and you are correct in understanding the experimental set-up. However, the data are ordinal so is the 2 x 2 ANOVA with follow-up tests still valid? • Charles says: Joel, When you say that the data is ordinal what do you mean? Can you give me some examples? If the data is appropriate and the other assumptions are met then this 2 x 2 ANOVA would be appropriate. Charles • Joel says: yes, can give example – the data are basically scores for a type of tissue response, e.g. score of 0 = little to no response, 1 = mild, 3 = moderate, 4 = severe. I have been lead to believe that these type of data do not meet normality or equal variance assumptions since scores are some what subjective and scale is arbitrary • Charles says: Joel, Using a Likert scale presents some challenges. The data can be interpreted as interval data provided it is likely that the 0, 1, 2, 3, 4 measures can be viewed as equally spaced (which would mean for example that difference in tissue response between severe and moderate is about the same as between mild and little to no response. Also the more scales used the better the fit as interval data (e.g. a 7-point scale is a better fit than a 5-point scale). If your data is treated as ordinal then two choices are to use Kruskal-Wallis one-way ANOVA test or the Scheirer-Ray-Hare two-way ANOVA test. You can’t use Friedman’s test since you don’t have repeated measures. Charles 7. Claude Pelletier says: Hello I am interpreting a Mixed ANOVA Test I have 2 Groups (100/Group); Control and Treatment (between subject factor: Independant variable), which had their Phosphate blood value (dependent variable) measured over time (within subject factor) (0h 12h 24h 36h 48h 60h and 72h). If I am correct this is a 2×2 Mixed ANOVA. But first testing the assumptions. I have about 20 outliers which I double checked and decided to keep them and make a notice of it in my interpretation. I transformed my data to meet the normality assumption but lost on variance homegeneity assumption, Since ANOVA is robust to normality but not homogeneity of variance I kept the original data without transformation. Now I am about to proove the Covariance assumption. If I do not meet the covariance assumption I will turn to a Non-parametric test. Now my question is: what is the proper Non-parametric test for a 2×2 Mixed ANOVA? Thank you. • Charles says: Hello Claude, This does look to be a 2 x 6 Mixed ANOVA. I agree that ANOVA is more sensitive to violations of homogeneity of variances than the normality assumption. Generally, I would simply use a sphericity correction rather than worrying about whether the sphericity assumption holds. See Sphericity for more details. The most commonly used substitute for repeated measures ANOVA is a repeated measures MANOVA (where you don’t need to satisfy the sphericity assumption). Charles 8. Ciacco says: Why 12, why is there a 12 in the equation – where is 12 coming from? Thank you • Charles says: The 12 is just the result of some mathematical calculation. You would have to look at the mathematical derivation of the Q statistic. Charles 9. Daniel says: Dear Professor First, thank you very much. I have a similar case of repeted measures but I dont have subjects and I have treatments with differents repetitions each one per each measure time. How can I work with these type of data? Dani • Charles says: You would need to use a different type of model, based on regression or linear mixed models. I will be exploring these sorts of models shortly. Charles 10. Eveline says: Dear Mr. Zaiontz Let me first thank you for your explanations. I’ve got a question regarding post hoc tests after a Friedman test. In my study N=301 subjects were asked to rate 10 items about their relevance (how important is item 1: “very important”, “rather important” etc.; item 2: “very important”, “rather important” etc., …). My final aim is to get a ranking list of these 10 items, i.e. which one is regarded as the most important, the second-most etc. I did a Friedman test and got a significant result. I assume that I’m not allowed to take the Mean Ranks and “arrange” the ranking list, i.e. take the one with the lowest Mean Rank, then the one with the second-lowest and so forth … so how do I get a ranking list? Do I have to do Wilcoxon tests for EVERY pair of items (that would be 45 comparisons)? Or would it be enough to compare each item with the one above and the one below according to the Mean Ranks? It might be a bit of a dumb question, but I’m really confused right now. Thank you in advance and greetings from Switzerland, Eveline • Charles says: Eveline, If what you want is a ranking of the 10 items, I can’t think of any reason why you couldn’t just order the items based on their total (or mean) ranks. The only problem with this, is if say item C has a lower rank than item E but the difference is not statistically significant. Presumably, the Friedman’s test just told you that at least two of the items have a significant difference in mean rank, which is obviously not that helpful. One approach to post hoc testing is Wilcoxon’s signed ranks test on all 45 pairs, using a Bonferroni correction. Based on such post hoc tests with just three items A, B and C, you could find that A and B are the only pairs that are significantly different (say A < B), which presumably would mean that C is between and A and B, although not significantly different from either one of them. Charles PS: From Wikipedia, Post-hoc tests [to Friedman's test] were proposed by Schaich and Hamerle (1984)[1] as well as Conover (1971, 1980)[2] in order to decide which groups are significantly different from each other, based upon the mean rank differences of the groups. These procedures are detailed in Bortz, Lienert and Boehnke (2000, pp. 275).[3] Not all statistical packages support Post-hoc analysis for Friedman's test, but user-contributed code exists that provides these facilities (for example in SPSS [1], and in R [2]) References 1. Schaich, E. & Hamerle, A. (1984). Verteilungsfreie statistische Prüfverfahren. Berlin: Springer. ISBN 3-540-13776-9. 2. Conover, W. J. (1971, 1980). Practical nonparametric statistics. New York: Wiley. ISBN 0-471-16851-3. 3. Bortz, J., Lienert, G. & Boehnke, K. (2000). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. ISBN 3-540-67590-6. 11. Ron_N says: Thank you for the insightful articles you provide. I have a project to deliver and i would request you to suggest me which is the best method/ test to conduct my study. I have a 6 month data set with 13 students(student 1, student 2, etc) and 11 tests(test1 – test 11) and the average time taken by each student to complete each test over 6 months. The data is not normal. I want to find out, for each test-wise, which students are underperforming or overperforming. I am proving a rough sample. Please suggest. test 1 test 2 test3 test 4 ———– test 11 student 1 45.56 8.56 0.56 50.56 student 2 23.25 9.65 8.25 32.65 student 3 22.32 24.56 16.32 24.56 student 4 15.12 33.26 25.12 44.26 student 5 80.32 45.00 15.32 45.00 student 13 • Charles says: Ron, It really depends on how you define under- or over-performing. You can look at this as a problem of identifying outliers. The website has several webpages that address this issue. Please use Search to find these pages. Charles • Ron_N says: Hi Charles, Thank you for your quick reply. I am thinking of applying IMR control charts for subgroup 1 (each individual is considered as one subgroup). Then i would target those individuals who would fall below or above the control limits. I am confused with one thing. Could you please guide me whether i should consider 1 sigma, 2 sigma or 3 sigma as my control limits? Thank you for this wonderful site. I have gathered so much knowledge from here. Regards, Ron • Charles says: Ron, Good to hear that you are getting value from the site. There is no clear rule regarding 1 sigma, 2 sigma or 3 sigma. I generally use 2.5 sigma, except for very large samples where I may choose a larger value like 3. Charles • Ron_N says: Thank you Charles for all your support. Now i can carry out my analysis. 12. Kevin says: Hi Charles, I’m glad that I could be of service for the next release regarding the SRH test! It is relatively unknown, but can be rather effective in the right environment, especially with non-normal data! I forgot to mention this in my original post, but the SRH test only works if the ANOVA is balanced. So there needs to be an equal number of observations for each level of each factor…of course, this will not usually be a problem for repeated measures designs unless one or more of the subjects are unable to complete the testing. You may have figured this out already, but I thought it would be best to clarify this restriction. Thanks again for running a great site! 13. SIM says: Hi Charles, Thank you for your tutorial, i am new to statistic. I would like to know how can i determined or select from the table based on my amount of subjects. ie N=5, N=6. From the table of critical values for three conditions, how should we determined which P values to choose from? Thanks! • Charles says: Sorry, but I would have to see the table that you are using since I am not providing such a table from my site. Charles 14. Andy says: Hi Charles, This is one of best explanations I have came across while wanting to know a bit more about Friedman statistic… Thank you! I have a couple of questions on the method and conclusion: 1. Can I use this method as a product tradeoff analysis like conjoint analysis and having no levels? As an example, lets say I ask people to share preferences of disruptive technologies. I give them$100 to distribute in the list. I then run Friedman test on collected samples to determine whether difference in choices were significant.

2. You mention – as p value > 0.05 we conclude there is no significant difference between the three types of wines.
>> What if p was < 0.05 and in this case does the Ranking order matter? How/When do I definitely say people like first Red then White and then Rose?

Andy.

• Charles says:

Andy,

I am pleased that you found my explanation of Friedman’s test useful. The following are some observations about your two questions.

1. I am not sure why you saw “no levels”. Suppose that people have three disruptive technologies to choose from. They assign their \$100 among the three technologies. My initial thought is that you perform Friedman’s test (or repeated measures ANOVA if the assumptions are satisfied) to determine whether there is a significant difference between the three technologies. Offhand I don’t see any problem with this approach, but I frankly haven’t enough time to think through this response in sufficient detail to make sure that this is so.

2. You can order the average preferences as you have indicated, but just because White has a higher ranking than Red in the sample, doesn’t mean that this represents a significant difference (with respect to the population). You would need to perform a post-hoc test to determine this. I have provided tests for post-hoc tests for ANOVA. I am now adding some for Kruskal-Wallis and soon I will look at follow-up testing for Friedman’s test.

Charles

15. Katerina says:

Dear Charles Zaiontz,

I am using Friedman test to compare three different conditions. I have 33 participants who took part in each condition.

My question is can I use Friedman test when the three groups are of unequal size? e.g. group 1 12 particip., groups 2 21 particip. etc.

Katerina

• Charles says:

Katerina,
Friedman’s test would be used when you have 33 participants who experienced all three conditions. In this case the sample sizes would all be the same, namely 33.
If you are instead looking at three independent groups, the participants in each group experience one and only one condition, then you want to use fixed factor ANOVA, Kruskal-Wallis or some other similar test.
Charles

16. Lucas D. Mazza says:

To check the p-value use formula: CHISQ.DIST.RT (1.79,2)

• Charles says:

Lucas,
Yes, that is correct.
Charles

17. Noemi says:

Dear Mr. Zaiontz,

I have a very specific problem (surprisingly similar with the one posted in the comments above) and am a bit confused on which statistical test to use. I am sorry if I bother you but I do not understand what you mean with simple arithmetic?
I have investigated the effect of 5 different treatments and used a scale between 1-5 to evaluate the effect. I repeated the investigation 6 times. My data is therefor ordinal (ranked), non-parametric and not normally distirbuted.
However, I also calculated the means of each treatment over time of the 6 runs.
I’m not sure if I have to compare the means of the 5 Treatments or do not use the means at all and make a Friedman test for each treatment of the six runs. I would be very thankful for any Information on how to proceed and what statistical test to use. (I was thinking about Man-Whitney U or Friedman’s test?)
Thank you!

• Charles says:

Noemi,

I need the some additional information before I can answer your question. First of all I need to better understand the experiment. In particular,

1. Are all 5 treatments applied to each of the 6 people in the sample?
2. Are the 5 treatments applied to different people (for a total of 30 people in the study)?
3. Is the sample divided into 5 groups, one for each treatment, where each person in the sample gets one treatment over 6 different time intervals?

Before answering the question about which statistical test you should use, I need to understand what hypothesis are you trying to test.

Charles

18. Umer says:

Dear Mr. Zaiontz,
Your article was really helpful. I just want to confirm the use of Friedman’s test for analysis of data collected through 5-scale likert-type questions. Basically, we have 5 factors and we want to find which one of these are more effective. We have gathered data using 5-scale Likert-type questions with ordinal ranking from 1 to 5 (ranging from Strongly Disagree to Strongly Agree). Can we get a sort of ranking of these five factors according to their effectiveness using this test?
Thank you!

• Charles says:

You don’t need to use Friedman’s test to get a ranking of the five factors. Simple arithmetic is sufficient. What Friedman’s test will tell you is whether any differences between the mean rankings are statistically significant (or are just due to random effects).
Charles

• Umer says:

Thank you very much Mr. Zaiontz!
So that means that i can find the difference (significant or insignificant) between 4 or 5 groups of data? Basically, it is for research thesis and we cannot simply deduce something based on the ordinal 5-scale. We intent to use t-test for comparing pairs of data set but it will be a good thing if we could test all 5 of them together?

• Charles says:

Friedman’s Test tests all 4 (or 5) groups together to determine whether the 4 groups statistically have the same mean (the null hypothesis). If the answer is yes, then you are done. If the answer is no (i.e. at least 2 of the groups have different means), then you would typically perform a t test (or Mann-Whitney test) to determine which two groups are the ones with different means (or some other similar test).
Charles

19. Jiahui says:

Dear Mr. Zaiontz,

I am writing to ask which data analysis method is appropriate for non-parametric 2 within-subject factors ANOVA.

My dependent variable is error rate (0-1), which does not apply normal distribution. I transformed the DV data, and conducted the repeated measures ANOVA. I am interested in the results based on non-parametric methods too. But I did not found a proper way yet.

Thanks!

Jiahui

• Charles says:

Jiahui,
Friedman’s test is a way of conducting a non-parametric repeated measures ANOVA.
Charles

• Jiahui says:

I know Fiedman’s test is the non-parametric alternative to the one-way ANOVA with repeated measures. Can I use Friedman’s test for two-within factors ANOVA? In case I am interested in the non-parametric two-way repeated ANOVA, which method I can use?

• Charles says:

I don’t know of any nonparametric method for two-way repeated ANOVA.
Charles

• Kevin says:

Hi Charles,

There is actually a test for a nonparametric 2 way ANOVA with replication/repeated measures, called the Schierer Ray Hare test. Essentially, you replace the original data with their respective overall ranking, regardless of the level of either factor that they occupy.

You then perform a standard 2 way ANOVA on the ranked data, but you take things a step further by dividing the total sum of squares by the total degrees of freedom to get a “total mean square.” You then divide the individual sums of squares for the two factors and the interaction term (not the mean sums as you did to determine the F ratios) by this “total mean square” to yield an H ratio for the two factors and the interaction.

Provided each group has at least 5 elements, these H ratios can be evaluated against the chi-square distribution with the relevant degrees of freedom from the two way ANOVA. If either CS value for the individual factors is significant, then the groups differ in that respective factor; likewise if the interaction H is significant, there is a significant interaction between the factors for the groups in question.

• Charles says:

Kevin,
Thanks for your very helpful comment. I will add the Schierer Ray Hare test to the next release of the Real Statistics software.
Charles

20. Roger Bakeman says:

This was very helpful. Even more helpful were your comments for the Wilcoxon signed ranks test because it gave me the information I needed to calculate the 95% confidence interval for T and to calculate the effect size r. Increasingly, journal editors are asking for these. Could you provide information on how to compute a stander error for H (as provided for T) and the effect size r for the Friedman test?

• Charles says:

Roger,
I don’t know of any commonly accepted values for the standard error or effect size for Friedman’s test, although Kendall’s W is often cited as an effect size for Friedman’s H. Here W = H/(m(k-1)) where k = the number of groups (treatments) and m = the number of subjects. Also used as an effect size is the r coefficient for Kendall’s W, which is r = (mW-1)/(m-1). In fact it can be shown that r is the average Spearman correlation coefficient computed on the ranks of all pairs of raters.
Charles

• pal says:

U were mentioned,
“effect size for Friedman’s H. Here W = H/(m(k-1)) where k = the number of groups (treatments) and m = the number of subjects.”
Do you know any reference of this.

• Charles says: