The **Kuder and Richardson Formula 20** test checks the internal consistency of measurements with dichotomous choices. It is equivalent to performing the split half methodology on all combinations of questions and is applicable when each question is either right or wrong. A correct question scores 1 and an incorrect question scores 0. The test statistic is

where

*k* = number of questions

*p _{j}* = number of people in the sample who answered question

*j*correctly

*q _{j}* = number of people in the sample who didn’t answer question

*j*correctly

*σ ^{2}* = variance of the total scores of all the people taking the test = VARP(R1) where R1 = array containing the total scores of all the people taking the test.

Values range from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

**Example 1**: A questionnaire with 11 questions is administered to 12 students. The results are listed in the upper portion of Figure 1. Determine the reliability of the questionnaire using Kuder and Richardson Formula 20.

The values of *p* in row 18 are the percentage of students who answered that question correctly – e.g. the formula in cell B18 is =B16/COUNT(B4:B15). The values of *q* in row 19 are the percentage of students who answered that question incorrectly – e.g. the formula in cell B19 is =1–B18. The values of *pq* are simply the product of the *p* and *q* values, with the sum given in cell M20.

We can calculate *ρ _{KR20} *as described in Figure 2.

**Figure 2 – ****Key formulas for worksheet in Figure 1**

The value *ρ*_{KR20} = 0.738 shows that the test has high reliability.

**Real Statistics Function**: The Real Statistics Resource Pack contains the following supplemental function:

**KUDER**(R1) = KR20 coefficient for the data in range R1.

**Observation**: For Example 1, KUDER(B4:L15) = .738.

**Observation**: Where the questions in a test all have approximately the same difficulty (i.e. the mean score of each question is approximately equal to the mean score of all the questions), then a simplified version of Kuder and Richardson Formula 20 is **Kuder and** **Richardson Formula 21**, defined as follows:

where *μ* is the population mean score (obviously approximated by the observed mean score).

For Example 1, *μ* = 69/12 = 5.75, and so

Note that *ρ*_{KR21} typically underestimates the reliability of a test compared to *ρ*_{KR20} .

Thank you for this. It will be a great help if you provide the Excel file in the example as well.

Hi Mike,

You can download an Excel file which contains all the examples in the website. Just go to page

http://www.real-statistics.com/free-download/real-statistics-examples-workbook/

and follow the instructions for downloading the Examples Workbook and linking it to the Real Statistics Resource Pack.

Charles

Hi admin!

if pj = number of people in the sample who answered question j “correctly”

Would this be applicable to test with likert scale as scoring too? Since there will be no correct answers if this would be the case.

Thank you 😀

Tycho,

As you correctly point out, with Likert scoring there is no correct answer. To calculate Cronbach’s alpha in this case, you use the actual Likert scores. Yesterday I added an example of how to do this in Example 4 on webpage http://www.real-statistics.com/reliability/cronbachs-alpha/. Take a look at the example and let me know if this addresses your question.

Charles

Hi!

k 7

Epq 0.888888889

var 2.888888889

P 0.807692308

Does this mean that we have high reliability? thank you!

Abbey,

Yes, a p-value of .80769 would normally be considered to indicate high reliability.

Charles

under what condtions kudar richardson formula is applied and what are its different forms?

Dear Jeeha,

This test is used when there two choices for each answer: correct and incorrect. It can be applied for example for multiple choice tests even with say 4 choices. Here the wrong answers count as 0 and the right answer counts as 1. If you have non-dichotomous questions (e.g. with a Likert scale) you should use Cronbach’s alpha instead.

Charles

Hello, Thank you very much for providing this example.

Above you note that:

A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

What do you mean by ‘homogeneous test’ is this a problem? If you are aware of a source explaining this issue, could you please provide a reference?

Thank you.

Tatiana,

By homogeneous test I am referring to tests which are likely testing the same thing. E.g. if I design a questionnaire with two questions whose sample data has a correlation of .98, it may be presumed that I don’t need to retain both questions since both are pretty much testing the same thing. In this case I might be justified in dropping one of the two questions from the questionnaire. The Cronbach’s Alpha page of the website (http://www.real-statistics.com/reliability/cronbachs-alpha/) also provides some information about this issue.

Charles

hi, there could i ask about the different spesific usage between: Kr20 & Kr21

Cz im so confusedabout that ?canu help me? please reply @ my e,mail badrus_barokah21@yahoo.com Or twitter : @bad_lucky

Hi Badrus,

I have now updated the Kuder and Richardson Formula 20 webpage to also show how to calculate KR21 and when it can be used. Please look at that page for help in answering your question. Frankly KR21 is simply an approximation of KR20, and so it is of limited use.

Charles

Can you direct me on the use of LDC? Apparently it is used when the KR20 is low (i.e. below .80). I would like a reference to read, some understanding of the assumptions upon which it is based, and why it is used.

Thanks

Helen,

I just sent you an email response, but it bounced back to me with an error message.

Charles

Thank you so much for the information. That is a great help!

can i use KR20 for ‘fill in the blank questionnaire’, which could be given either right or wrong answer.

Zuraidah,

I believe so, as long as there as a clear criteria for right or wrong.

Charles

Good day, just wanna ask, at what value of KR 20 would we start to interpret as having low or no correlation? Is there a value that we can say this items have low, moderate or high correlation?

Hello Robert,

I have usually seen that a value of at least.70 is desired for most exams. The following are the criteria used by Imran Zafar in http://com.ksau-hs.edu.sa/eng/images/DME_Fact_Sheets/fs_24.doc

Charles

Reliability Interpretation

.90 and above Excellent reliability; at the level of the best standardized tests

.80 – .90 Very good for a classroom test

.70 – .80 Good for a classroom test; in the range of most. There are probably a few items which could be improved.

.60 – .70 Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to determine grades. There are probably some items which could be improved.

.50 – .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The test definitely needs to be supplemented by other measures (e.g., more tests) for grading.

.50 or below Questionable reliability. This test should not contribute heavily to the course grade, and it needs revision.

Robert,

Another useful source is http://www.omet.pitt.edu/docs/OMET%20Test%20and%20Item%20Analysis.pdf

Charles

Hi, can I usd KR20 for calculate a test which is someehat like a likert scale.? If yes how do I go about it.

Hi, if you have scores which are in Likert scale (or somewhat similar) I suggest that you explore using Cronbach’s alpha instead of KR20 (which is designed for correct vs incorrect only). You can get further info at http://www.real-statistics.com/reliability/cronbachs-alpha/.

Charles

evening dr charles, thank you for your great guidance on KR20. it helps.

But, I would like to ask, to test realibility, there is another test on Pearson correlation to test on exam scores reability. Which is the better test ? -thank you

Azlina,

I don’t completely understand your question, but Cronbach’s alpha is often used instead of KR20 to test releiability.

Charles

hi, thanks, the information was worthy.

What if I want to test the reliability of a self-designed questionnaire to test participant’s knowledge and attitude on some theme/subject like hepatitis, questionnaire has multiple choices however no choice is right.

Salima,

If no choice is right, how do you evaluate the answers given? Please provide more information.

Charles

Hello Charles,

How could I know if the coeficient KR20 is high or low? could you send me a reference from any author?

thanks

Hello Raphaela,

There doesn’t seem to be a common view on this, but I most frequently see a cutoff of .6 or .7. The following call for a cutoff of .60::

http://chitester.wordpress.com/instructor-guide/section-4-reporting-features/test-and-item-analysis/

http://www.umaryland.edu/cits/services/testscoring/umbtestscoring_testanditemanalysis.pdf

Charles

Dear Charles,

I have calculated my KR 21 of my pretest and post test in excel. I found the results are o.3 and 0.5. How do I know if my tests are reliable or not?

Mary,

These are not generally viewed to be very high figures, and so there is doubt about the reliability.

Although the values probably won’tbe that different, I suggest that you use KR20 (or Cronbach’s alpha) which are more accurate.

Charles

good evening.. just wana ask ..how did you get the answer in the example shown above , where VAR is equal to 6.52083.. please. thankyou so much

Judy,

The answer is shown in Figure 2 of the referenced webpage. I used the VARP function.

Charles

dear admin,

i just want to ask how to calculate the varices just give me the example

The formulas for the variances is shown in Figure 2 of the referenced webpage. If you download the Examples Workbook you can see all the formulas used for this example (and all the others shown on the website). You can download this file for free at http://www.real-statistics.com/free-download/real-statistics-examples-workbook/

Charles

Thanks Charles for this information. I do have a question; is it possible to calculate the KR20 using the split-half method? When I attempt, I’m always getting a number higher than 0-1 ..i.e… 1.9, etc.

Jarvis,

I don’t understand what it means to calculate KR20 using the split-half method or why you would want to do so. If you calculate KR20 as described on the referenced webpage you should get a value no higher than 1.

Charles

Dear Dr. Charles,

First of all I would like to express my gratitude for your precious efforts. This page is a gold mine for people like me.

I wish to use Kuder and Richardson Formula for a number of tests for language learning. Do you think I should use any other formulas or would KR-20 fit for my purposes.

Here are my tests:

Test A: Translation

1- DOG: …………….. (Write the meaning in your first language)

Test B: Multiple Choice

Bow Wow – What animal is this?

a) CAT b)DOG c)…… etc.

Test C: Fill in the blank

A …………. barks.

KR20 only supports dichotomous questions (e.g. 1 for correct answer and 0 for incorrect answer). This seems to be the case with the three questions you are posing, and so KR20 could be used. You can also use Cronbach’s Alpha.

Whether KR20 (or Cronbach’s Alpha) is a fit for your purposes depends of course on what you are trying to do (which is not stated in your comment).

Charles

Thank you Charles, I really appreciate your response. I aim to measure learners existing vocabulary knowledge on some technical engineering vocabulary the above examples have been given to show you my test formats.

We gave the exact same 100 question exam to students in consecutive classes (1000 students in both 2013 and 2014). The calculated KR21 values for each class were equal to within 3 digits (0.8763 vs 0.8765). I assume this tells me something about the student populations. Does does it mean?

Mark,

The results seem to show high internal consistency of the exam. If the 1,000 students are independent then the similar KR21 scores is not surprising.

Charles

Question should read “What does it mean?”

Charles, I was wondering if you could look at a screen-shot I have provided and help explain the results to me. I have many different content areas to work on, but in the following example, I have a 10 question dichotomous exam which was given to about 11,300 students. I am using all the results from these exams to find p, which, is quite low. What I do not understand, is that the higher my variance is, the better my p value, or, the lower my Σpq the higher my p.

KR-20 Example

Obviously, I have a bias; and am sure that our questions are reliable, but, the numbers appear to say otherwise. The only other thing I can think of, is that our exam is random. So while we have 10 questions per exam, Q1 may be different than the next student’s Q1, although the question is designed to measure the same knoweldge of the given topic. Do I need to use this formula with a static exam? Where Q1 is the exact same question for all student’s?

Your assistance is much appreciated, Thanks

Tony,

KR20 and Cronbach’s Alpha assume that Q1 is the same for all the students. If this is not the case, then I wouldn’t rely on the KR20 or Cronbach’s alpha measures.

Charles

Thank you Charles. Due to the fact, that we provide randomization w/ our questions, would you be able to provide a plausible approach for measuring the reliability/consistency of our examinations? When we run a t-test; our results indicate a high probability that our results correlate. Aside from Pearson’s R/Spearman (which can change based on how we collect/sort data) is there another method we can use to prove internal consistency?

Sorry Tony, but I don’t know of an approach in this case.

Charles

Thanks for your explanation on KR20. Please I calculated the reliability of my research instrument and I got KR20 = 0.65. Can I call this a high reliability? If no, what should be done to the instrument.

Thanks. I will appreciate your quick response please.

Lizzy,

There isn’t complete agreement as to what constitutes “high reliability”, but here are my observations about Cronbach’s alpha. These apply to KR20 as well (since the index produced are identical):

A commonly-accepted rule of thumb is that an alpha of 0.7 (some say 0.6) indicates acceptable reliability, and 0.8 or higher indicates good reliability. Very high reliability (0.95 or higher) is not necessarily desirable, as this indicates that the items may be entirely redundant. These are only guidelines and the actual value of Cronbach’s alpha will depend on many things. E.g. as the number of items increases, Cronbach’s alpha tends to increase too even without any increase in internal consistency.

Charles

Pls i stil don’t understand how the variance was calculated. i.e B24

This is already shown in Figure 2.

Charles

Pls Charles, can you break it down for me, pls, if possible use figure frm the table.

How did you work VARP(M4:M15)?

Hello sir. Thank you for the above info. Very helpful indeed. Please help me how you came up with variance or 02 of 6.52083. I just cant figure it out. Thank you sir!

The formula to calculate 6.52083 is shown in Figure 2. I use the VARP function for all the variances. If I had used VAR I would get the same result.

Charles

What do I do when I only have a single question which is like this?

Justin,

Cronbach’s alpha can be used even when the questions take a value 1 (say for correct) and 0 (say for incorrect). This would cover True/False and multiple choice questions. The problem is that you can’t use Cronbach’s alpha with only one question. After all, Cronbach’s alpha measures internal consistency, but withonly one question there is no internal consistency to measure. The same is true for Kuder and Richardson.

Charles

Hello Dear Charles,

Kindly give reference of this

There are lots of references for the KR20 test. Most statistic textbooks as well as many websites on the Internet. The original paper is

Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151–160.

Charles

is KR-20 just for multiple choice question?

KR-20 is not just for multiple choice questions, but it is just for questions which take only two values (e.g. 1 = correct, 0 = incorrect). This includes True/False questions as well.

Charles

Hello Dear Charles,

I would like to ask you that can we calculate KR 20 by using SPSS?kind provide some reference.

Thanks

Asad,

Probably so, but I don’t use SPSS.

Charles

Hi,

I’ve tried your formula, but it turns out the KR-20 can be negative if I randomly generate students’ correctness. Can you explain why? According to Wiki, KR-20 range from 0-1, my data is as follow:

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11

1 0 0 0 1 0 0 1 1 0 0 0 3

2 0 1 1 0 0 0 1 1 0 0 1 5

3 0 0 1 1 0 0 0 1 1 1 1 6

4 1 1 0 0 0 0 1 1 1 1 0 6

5 1 0 1 1 0 0 0 1 0 0 1 5

6 1 0 0 1 0 1 0 0 1 0 1 5

7 1 1 1 1 0 1 0 0 0 0 1 6

8 0 0 0 0 1 1 1 0 0 0 0 3

9 1 1 1 1 0 0 0 0 0 0 0 4

10 1 0 0 1 0 1 0 0 1 0 0 4

11 1 0 1 1 1 1 0 0 0 0 1 6

12 0 0 0 0 1 0 1 0 0 1 1 4

Total 7 4 6 8 3 5 5 5 4 3 7 57

The simple answer is that KR-20 can indeed be negative. This is not so common in practice but it can happen.

Charles

100 Schüler

14 Fragen

1123 positive Antworten

sd² = 2,1587

KR-21 = -0,0316

Wie interpretiert man negatives Ergebnis von KR-21?

Danke

Josef,

Negative value for KR-21, KR-20 and Cronbach’s Alpha can happen, although in practice they are not so common. These are generally signs that reliability is very very poor.

Charles

i got the result of my KR21 is 0.983

what does it mean?

do you have a list to show whether the score has high/medium/low reliability?

it would be very helpful..thank you, Charles.

Adrian,

The result indicates extremely high reliability (almost the maximum value), but this is not necessarily good since a value this high indicates homogeneity of questions (i.e. there isn’t much difference between the questions).

Charles

Hi Sir Charles,

1. What reliability test can be used for essay type of exam?

2. If the set of exam is mixed of different types of test (e.g. I. Multiple Choice, II. True or False, III. Fill in the blanks, IV. Enumeration, V. Essay), will it be ok to still use the KR-20? How will it be done, will it be per type of exam or will it be as total of all question? If not, what reliability test is more appropriate to use?

3. For enumeration type of exam, how will the KR-20 be done? (e.g. Q1 enumerate 5 things, Q2 enumerate 3 things, Q3 enumerate 4 things) Will k = 3 or k = 12?

Thanks in advance.

Grace,

1. You need to be able to assign a score of 1 for correct and 0 for incorrect you can use KR20 or any of the other commonly used reliability tests.

2. As long as all the questions are testing the same thing and can be coded 0 or 1 as described above, you can use KR20. If you have 20 questions that test one thing and 20 questions that test something else, then you should create two separate KR20 measurements

3. It depends on how you code correct vs. incorrect. If for example, a person is given 5 words to recall and you define “correct” as they recall all five (or 4 out of 5), then this counts as one question. If they get one point for each word they correctly recall then this counts as 5 questions. In this latter case, you have the further problem that the order of the words will be relevant, plus you have the issue of how to score a response of a word that wasn’t even on the original list.

Charles

hi sir charles.

if i am going to solve it on paper what formula should i use for KR20? because there is so many formula on net and i dont know what to use.. i just need a simple example and process of solving using KR20 for my report, tommorow. and kindly give me another example because your sample is easier to understand than other source on net.. hope that you can help me.. thanks a lot.. God bless.

Hi, the example on the referenced webpage provides all the information you need to calculate KR20. You can also download the Excel spreadsheet for this example from Examples Workbooks so that you can see all the formulas better.

Charles

hello…

I have exercises that consist of multiple choice, true-false question and short answear to analyze the scores of this test which is better split-half question or K-20, I tried both I used of split-half Test odd-even and the result unreliable, later I tried to K-20 results are reliable, how come this happened?

test instruments is valid valid but unreliable , why this condition can occur?

pls help me with this..thankyou

It is not too surprising in general that two tests would give different results. This is especially true with small samples.

If you send me an Excel file with your data I can try to understand why you are getting such different results in this case. You can find my email address at Contact Us.

Charles

Thankyou fr yr response,

I already sent you my excel file to your contact, pls be considerable to check it

I have now looked at the spreadsheet that you sent me. The value I calculated for KR20 = .0677, which agrees with the value you calculated. I calculated a value of -.052 for split-half (odd-even). These values are both very low, indicating unreliability.

Charles

Hello Charles,

You are doing a good job. Keep it up. God bless you.

Sir Charle

Can’t I ask you about kuder richard formula 20

what exact meaning about kuder richrd formula 20 thanks

It is a measure of internal consistency of measurements with dichotomous choices. It produces the exact same result as Cronbach’s alpha when all the choices are dichotomous. See Cronbach’s alpha for more information.

Charles

thanks sir charles

Dear Charles,

Thanks a million for these excellent explanations. I conducted an online poll for my MBA Thesis. Now I tried to compute the Cronbach’s Alpha. As it turned out the variance in my case is 0 (cell B21 in your example) because it was not possible to give a wrong answer, e.g. all participants answered all questions.

As it is not possible to divide through 0 I get an invalid result for the Alpha. My suggestion is to state that the questionnaire provides a perfect reliability. What would be the best way to interpret this fact or did I misunderstand something?

Thanks in advance,

Marc

Marc,

If there aren’t any wrong answers, it is not clear to me why you even want to use Cronbach’s alpha. I guess in this case you could consider the questionnaire reliable.

Charles

Hi Charles,

Just wondering! Which of the methods, Cronbach’s alpha or KR20, is better to investigate the reliability of dichotomous questions?

Thanks in advance.

Luke

Luke,

They should give identical results.

Charles

Many thanks Charles.

Why does exam software (like Examsoft) report KR20 on multiple choice exams if the assumptions for the statistic are dichotomous answer choices? Most Examsoft items have at least 4 choices with only 1 correct answer. Is this a relevant statistic in this scenario?

The coding is not based on the 4 choices but on 1 if the answer is correct or 0 if the answer is incorrect (this is dichotomous).

Charles

Hi everyone,

I’m doing master of Tesol & thesis now. I got a problem that need your advices.

The answers of questionnaire are wrong or right with 68 participants? Should I use KR20 or Cronbach’ Alpha to do reliability test for pilot study ?

Thanks so much

Hi Kim,

Yes, you can use KR20 and Cronbach’s Alpha to measure reliability. If the coding you use is 0 for wrong and 1 for right, KR20 and Cronbach’s Alpha will yield the exact same answer.

Charles

I will like to know the conditions that calls for the use of Kuder Richardson 21 and 22 fomular and also Crombac bach Alpha.

Sorry, but I don’t understand what you mean by “the conditions that call for the use of”.

Charles

simply put, what is the difference between Kuder Richardson 21 and 22 formular.

Catherine,

I assume that you mean “what is the difference between Kuder Richardson 21 and 20 formulas?” The KR21 formula is a simplified version of the KR20 formula, which was useful in the days before computers. There is no reason that I can think of for using the KR21 formula. You should use KR20 instead.

Charles

Inter-Item Correlation Matrix

V2 V4 V5 V6 V9 V10 V12

V2 1.000 1.000 -.049 .386 .702 -.021 .702

V4 1.000 1.000 -.049 .386 .702 -.021 .702

V5 -.049 -.049 1.000 .214 -.034 .434 -.034

V6 .386 .386 .214 1 .000 .569 -.026 .569

V9 .702 .702 -.034 .569 1.000 -.015 1.000

V10 -.021 -.021 .434 -.026 -.015 1.000 -.015

V12 .702 .702 -.034 .569 1.000 -.015 1.000

Cronbach’s Alpha is 0.723 now , there are 7 items chosen totally now as well ( after running KR20 with 16 items at the first time )

Because item V5 & V10 still have negative value. However, I do want to use it as the final result , can’t I? If not, may u explain why not .

( the reason is I have to make sure the No of these items in this test are equal to the no of items in another one, then I will do correlation for language transfer between 2 Vietnamese & English)

Thanks so much !

res

Sorry Kim, but I don’t completely understand the data or your question.

Charles

Good morning Charles,

Thanks for your reply.

I did something wrong with my data. Sorry.

May I know how can I explain with KR 20 ?

For instance,

I will use Cronbach’s alpha based on standardized items to explain instead of Cronbach’s Alpha , right ?

The way I will explain when I use Cronbach’s alpha based on standardized items is similar to Cronbach’s Alpha or not? If not, how ?

Please instruct me . I really appreciate your help .

Thank you

Kim,

I don’t know whether did something wrong or not with your data. I simply did not have time to try to interpret the data that you sent me.

Sorry, but I also don’t understand your questions. I receive so many questions from so many people, and don’t really have the time to answer unless the questions are clearly worded.

Charles