Kuder and Richardson Formula 20

The Kuder and Richardson Formula 20 test checks the internal consistency of measurements with dichotomous choices. It is equivalent to performing the split half methodology on all combinations of questions and is applicable when each question is either right or wrong. A correct question scores 1 and an incorrect question scores 0. The test statistic is

Kuder-Richardson formula 20

where

k = number of questions

pj = number of people in the sample who answered question j correctly

qj = number of people in the sample who didn’t answer question j correctly

σ2 = variance of the total scores of all the people taking the test = VARP(R1) where R1 = array containing the total scores of all the people taking the test.

Values range from 0 to 1. A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

Example 1: A questionnaire with 11 questions is administered to 12 students. The results are listed in the upper portion of Figure 1. Determine the reliability of the questionnaire using Kuder and Richardson Formula 20.

Kuder Richardson Excel

Figure 1 – Kuder and Richardson Formula 20 for Example 1

The values of p in row 18 are the percentage of students who answered that question correctly – e.g. the formula in cell B18 is =B16/COUNT(B4:B15). The values of q in row 19 are the percentage of students who answered that question incorrectly – e.g. the formula in cell B19 is =1–B18. The values of pq are simply the product of the p and q values, with the sum given in cell M20.

We can calculate ρKR20 as described in Figure 2.

Kuder Richardson formulas

Figure 2 – Key formulas for worksheet in Figure 1

The value ρKR20 = 0.738 shows that the test has high reliability.

Real Statistics Function: The Real Statistics Resource Pack contains the following supplemental function:

KUDER(R1) = KR20 coefficient for the data in range R1.

Observation: For Example 1, KUDER(B4:L15) = .738.

Observation: Where the questions in a test all have approximately the same difficulty (i.e. the mean score of each question is approximately equal to the mean score of all the questions), then a simplified version of Kuder and Richardson Formula 20 is Kuder and Richardson Formula 21, defined as follows:

image7098

where μ is the population mean score (obviously approximated by the observed mean score).

For Example 1,  μ = 69/12 = 5.75, and so

image7099

Note that ρKR21 typically underestimates the reliability of a test compared to ρKR20 .

44 Responses to Kuder and Richardson Formula 20

  1. Mike says:

    Thank you for this. It will be a great help if you provide the Excel file in the example as well.

  2. Tycho says:

    Hi admin!
    if pj = number of people in the sample who answered question j “correctly”
    Would this be applicable to test with likert scale as scoring too? Since there will be no correct answers if this would be the case.
    Thank you :D

  3. abbey says:

    Hi!

    k 7
    Epq 0.888888889
    var 2.888888889
    P 0.807692308

    Does this mean that we have high reliability? thank you!

  4. jeeha says:

    under what condtions kudar richardson formula is applied and what are its different forms?

    • Charles says:

      Dear Jeeha,
      This test is used when there two choices for each answer: correct and incorrect. It can be applied for example for multiple choice tests even with say 4 choices. Here the wrong answers count as 0 and the right answer counts as 1. If you have non-dichotomous questions (e.g. with a Likert scale) you should use Cronbach’s alpha instead.
      Charles

  5. Tatiana says:

    Hello, Thank you very much for providing this example.
    Above you note that:
    A high value indicates reliability, while too high a value (in excess of .90) indicates a homogeneous test.

    What do you mean by ‘homogeneous test’ is this a problem? If you are aware of a source explaining this issue, could you please provide a reference?
    Thank you.

    • Charles says:

      Tatiana,
      By homogeneous test I am referring to tests which are likely testing the same thing. E.g. if I design a questionnaire with two questions whose sample data has a correlation of .98, it may be presumed that I don’t need to retain both questions since both are pretty much testing the same thing. In this case I might be justified in dropping one of the two questions from the questionnaire. The Cronbach’s Alpha page of the website (http://www.real-statistics.com/reliability/cronbachs-alpha/) also provides some information about this issue.
      Charles

  6. badrus says:

    hi, there could i ask about the different spesific usage between: Kr20 & Kr21
    Cz im so confusedabout that ?canu help me? please reply @ my e,mail badrus_barokah21@yahoo.com Or twitter : @bad_lucky

    • Charles says:

      Hi Badrus,
      I have now updated the Kuder and Richardson Formula 20 webpage to also show how to calculate KR21 and when it can be used. Please look at that page for help in answering your question. Frankly KR21 is simply an approximation of KR20, and so it is of limited use.
      Charles

  7. Helen says:

    Can you direct me on the use of LDC? Apparently it is used when the KR20 is low (i.e. below .80). I would like a reference to read, some understanding of the assumptions upon which it is based, and why it is used.
    Thanks

  8. Christine says:

    Thank you so much for the information. That is a great help!

  9. zuraidah nordin says:

    can i use KR20 for ‘fill in the blank questionnaire’, which could be given either right or wrong answer.

  10. Robert says:

    Good day, just wanna ask, at what value of KR 20 would we start to interpret as having low or no correlation? Is there a value that we can say this items have low, moderate or high correlation?

    • Charles says:

      Hello Robert,

      I have usually seen that a value of at least.70 is desired for most exams. The following are the criteria used by Imran Zafar in http://com.ksau-hs.edu.sa/eng/images/DME_Fact_Sheets/fs_24.doc

      Charles

      Reliability Interpretation
      .90 and above Excellent reliability; at the level of the best standardized tests
      .80 – .90 Very good for a classroom test
      .70 – .80 Good for a classroom test; in the range of most. There are probably a few items which could be improved.
      .60 – .70 Somewhat low. This test needs to be supplemented by other measures (e.g., more tests) to determine grades. There are probably some items which could be improved.
      .50 – .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The test definitely needs to be supplemented by other measures (e.g., more tests) for grading.
      .50 or below Questionable reliability. This test should not contribute heavily to the course grade, and it needs revision.

  11. farida shullai says:

    Hi, can I usd KR20 for calculate a test which is someehat like a likert scale.? If yes how do I go about it.

  12. azlina says:

    evening dr charles, thank you for your great guidance on KR20. it helps.
    But, I would like to ask, to test realibility, there is another test on Pearson correlation to test on exam scores reability. Which is the better test ? -thank you

    • Charles says:

      Azlina,
      I don’t completely understand your question, but Cronbach’s alpha is often used instead of KR20 to test releiability.
      Charles

  13. Salima says:

    hi, thanks, the information was worthy.
    What if I want to test the reliability of a self-designed questionnaire to test participant’s knowledge and attitude on some theme/subject like hepatitis, questionnaire has multiple choices however no choice is right.

    • Charles says:

      Salima,
      If no choice is right, how do you evaluate the answers given? Please provide more information.
      Charles

  14. Raphaela says:

    Hello Charles,
    How could I know if the coeficient KR20 is high or low? could you send me a reference from any author?
    thanks

  15. judy says:

    good evening.. just wana ask ..how did you get the answer in the example shown above , where VAR is equal to 6.52083.. please. thankyou so much :) :)

  16. madan says:

    dear admin,
    i just want to ask how to calculate the varices just give me the example

  17. Jarvis says:

    Thanks Charles for this information. I do have a question; is it possible to calculate the KR20 using the split-half method? When I attempt, I’m always getting a number higher than 0-1 ..i.e… 1.9, etc.

    • Charles says:

      Jarvis,
      I don’t understand what it means to calculate KR20 using the split-half method or why you would want to do so. If you calculate KR20 as described on the referenced webpage you should get a value no higher than 1.
      Charles

  18. Umut says:

    Dear Dr. Charles,
    First of all I would like to express my gratitude for your precious efforts. This page is a gold mine for people like me.

    I wish to use Kuder and Richardson Formula for a number of tests for language learning. Do you think I should use any other formulas or would KR-20 fit for my purposes.

    Here are my tests:

    Test A: Translation
    1- DOG: …………….. (Write the meaning in your first language)

    Test B: Multiple Choice
    Bow Wow – What animal is this?
    a) CAT b)DOG c)…… etc.

    Test C: Fill in the blank
    A …………. barks.

    • Charles says:

      KR20 only supports dichotomous questions (e.g. 1 for correct answer and 0 for incorrect answer). This seems to be the case with the three questions you are posing, and so KR20 could be used. You can also use Cronbach’s Alpha.

      Whether KR20 (or Cronbach’s Alpha) is a fit for your purposes depends of course on what you are trying to do (which is not stated in your comment).

      Charles

      • Umut says:

        Thank you Charles, I really appreciate your response. I aim to measure learners existing vocabulary knowledge on some technical engineering vocabulary the above examples have been given to show you my test formats.

  19. Mark Campbell says:

    We gave the exact same 100 question exam to students in consecutive classes (1000 students in both 2013 and 2014). The calculated KR21 values for each class were equal to within 3 digits (0.8763 vs 0.8765). I assume this tells me something about the student populations. Does does it mean?

    • Charles says:

      Mark,
      The results seem to show high internal consistency of the exam. If the 1,000 students are independent then the similar KR21 scores is not surprising.
      Charles

  20. Mark Campbell says:

    Question should read “What does it mean?”

  21. Tony says:

    Charles, I was wondering if you could look at a screen-shot I have provided and help explain the results to me. I have many different content areas to work on, but in the following example, I have a 10 question dichotomous exam which was given to about 11,300 students. I am using all the results from these exams to find p, which, is quite low. What I do not understand, is that the higher my variance is, the better my p value, or, the lower my Σpq the higher my p.

    KR-20 Example

    Obviously, I have a bias; and am sure that our questions are reliable, but, the numbers appear to say otherwise. The only other thing I can think of, is that our exam is random. So while we have 10 questions per exam, Q1 may be different than the next student’s Q1, although the question is designed to measure the same knoweldge of the given topic. Do I need to use this formula with a static exam? Where Q1 is the exact same question for all student’s?

    Your assistance is much appreciated, Thanks

    • Charles says:

      Tony,
      KR20 and Cronbach’s Alpha assume that Q1 is the same for all the students. If this is not the case, then I wouldn’t rely on the KR20 or Cronbach’s alpha measures.
      Charles

      • Tony says:

        Thank you Charles. Due to the fact, that we provide randomization w/ our questions, would you be able to provide a plausible approach for measuring the reliability/consistency of our examinations? When we run a t-test; our results indicate a high probability that our results correlate. Aside from Pearson’s R/Spearman (which can change based on how we collect/sort data) is there another method we can use to prove internal consistency?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>