The **Kuder and Richardson Formula 20** test checks the internal consistency of measurements with dichotomous choices. It is equivalent to performing the split half methodology on all combinations of questions and is applicable when each question is either right or wrong. A correct question scores 1 and an incorrect question scores 0. The test statistic is

where

*k* = number of questions

*p _{j}* = number of people in the sample who answered question

*j*correctly

*q _{j}* = number of people in the sample who didn’t answer question

*j*correctly

*σ ^{2}* = variance of the total scores of all the people taking the test = VARP(R1) where R1 = array containing the total scores of all the people taking the test.

Values range from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

**Example 1**: A questionnaire with 11 questions is administered to 12 students. The results are listed in the upper portion of Figure 1. Determine the reliability of the questionnaire using Kuder and Richardson Formula 20.

The values of *p* in row 18 are the percentage of students who answered that question correctly – e.g. the formula in cell B18 is =B16/COUNT(B4:B15). The values of *q* in row 19 are the percentage of students who answered that question incorrectly – e.g. the formula in cell B19 is =1–B18. The values of *pq* are simply the product of the *p* and *q* values, with the sum given in cell M20.

We can calculate *ρ _{KR20} *as described in Figure 2.

**Figure 2 – ****Key formulas for worksheet in Figure 1**

The value *ρ*_{KR20} = 0.738 shows that the test has high reliability.

**Real Statistics Function**: The Real Statistics Resource Pack contains the following supplemental function:

**KUDER**(R1) = KR20 coefficient for the data in range R1.

**Observation**: For Example 1, KUDER(B4:L15) = .738.

**Observation**: Where the questions in a test all have approximately the same difficulty (i.e. the mean score of each question is approximately equal to the mean score of all the questions), then a simplified version of Kuder and Richardson Formula 20 is **Kuder and** **Richardson Formula 21**, defined as follows:

where *μ* is the population mean score (obviously approximated by the observed mean score).

For Example 1, *μ* = 69/12 = 5.75, and so

Note that *ρ*_{KR21} typically underestimates the reliability of a test compared to *ρ*_{KR20} .

Thank you for this. It will be a great help if you provide the Excel file in the example as well.

Hi Mike,

You can download an Excel file which contains all the examples in the website. Just go to page

http://www.real-statistics.com/free-download/real-statistics-examples-workbook/

and follow the instructions for downloading the Examples Workbook and linking it to the Real Statistics Resource Pack.

Charles

Hi admin!

if pj = number of people in the sample who answered question j “correctly”

Would this be applicable to test with likert scale as scoring too? Since there will be no correct answers if this would be the case.

Thank you

Tycho,

As you correctly point out, with Likert scoring there is no correct answer. To calculate Cronbach’s alpha in this case, you use the actual Likert scores. Yesterday I added an example of how to do this in Example 4 on webpage http://www.real-statistics.com/reliability/cronbachs-alpha/. Take a look at the example and let me know if this addresses your question.

Charles

Hi!

k 7

Epq 0.888888889

var 2.888888889

P 0.807692308

Does this mean that we have high reliability? thank you!

Abbey,

Yes, a p-value of .80769 would normally be considered to indicate high reliability.

Charles

under what condtions kudar richardson formula is applied and what are its different forms?

Dear Jeeha,

This test is used when there two choices for each answer: correct and incorrect. It can be applied for example for multiple choice tests even with say 4 choices. Here the wrong answers count as 0 and the right answer counts as 1. If you have non-dichotomous questions (e.g. with a Likert scale) you should use Cronbach’s alpha instead.

Charles

Hello, Thank you very much for providing this example.

Above you note that:

A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

What do you mean by ‘homogeneous test’ is this a problem? If you are aware of a source explaining this issue, could you please provide a reference?

Thank you.

Tatiana,

By homogeneous test I am referring to tests which are likely testing the same thing. E.g. if I design a questionnaire with two questions whose sample data has a correlation of .98, it may be presumed that I don’t need to retain both questions since both are pretty much testing the same thing. In this case I might be justified in dropping one of the two questions from the questionnaire. The Cronbach’s Alpha page of the website (http://www.real-statistics.com/reliability/cronbachs-alpha/) also provides some information about this issue.

Charles

hi, there could i ask about the different spesific usage between: Kr20 & Kr21

Cz im so confusedabout that ?canu help me? please reply @ my e,mail badrus_barokah21@yahoo.com Or twitter : @bad_lucky

Hi Badrus,

I have now updated the Kuder and Richardson Formula 20 webpage to also show how to calculate KR21 and when it can be used. Please look at that page for help in answering your question. Frankly KR21 is simply an approximation of KR20, and so it is of limited use.

Charles

Can you direct me on the use of LDC? Apparently it is used when the KR20 is low (i.e. below .80). I would like a reference to read, some understanding of the assumptions upon which it is based, and why it is used.

Thanks

Helen,

I just sent you an email response, but it bounced back to me with an error message.

Charles

Thank you so much for the information. That is a great help!

can i use KR20 for ‘fill in the blank questionnaire’, which could be given either right or wrong answer.

Zuraidah,

I believe so, as long as there as a clear criteria for right or wrong.

Charles

Good day, just wanna ask, at what value of KR 20 would we start to interpret as having low or no correlation? Is there a value that we can say this items have low, moderate or high correlation?

Hello Robert,

I have usually seen that a value of at least.70 is desired for most exams. The following are the criteria used by Imran Zafar in http://com.ksau-hs.edu.sa/eng/images/DME_Fact_Sheets/fs_24.doc

Charles

Reliability Interpretation

.90 and above Excellent reliability; at the level of the best standardized tests

.80 – .90 Very good for a classroom test

.70 – .80 Good for a classroom test; in the range of most. There are probably a few items which could be improved.

.60 – .70 Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to determine grades. There are probably some items which could be improved.

.50 – .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The test definitely needs to be supplemented by other measures (e.g., more tests) for grading.

.50 or below Questionable reliability. This test should not contribute heavily to the course grade, and it needs revision.

Robert,

Another useful source is http://www.omet.pitt.edu/docs/OMET%20Test%20and%20Item%20Analysis.pdf

Charles

Hi, can I usd KR20 for calculate a test which is someehat like a likert scale.? If yes how do I go about it.

Hi, if you have scores which are in Likert scale (or somewhat similar) I suggest that you explore using Cronbach’s alpha instead of KR20 (which is designed for correct vs incorrect only). You can get further info at http://www.real-statistics.com/reliability/cronbachs-alpha/.

Charles

evening dr charles, thank you for your great guidance on KR20. it helps.

But, I would like to ask, to test realibility, there is another test on Pearson correlation to test on exam scores reability. Which is the better test ? -thank you

Azlina,

I don’t completely understand your question, but Cronbach’s alpha is often used instead of KR20 to test releiability.

Charles

hi, thanks, the information was worthy.

What if I want to test the reliability of a self-designed questionnaire to test participant’s knowledge and attitude on some theme/subject like hepatitis, questionnaire has multiple choices however no choice is right.

Salima,

If no choice is right, how do you evaluate the answers given? Please provide more information.

Charles