Effect Size for Chi-square Test

We review three different measures of effect size: Phi φ, Cramer’s V and the Odds Ratio.

Phi φ

For the goodness of fit in 2 × 2 contingency tables, phi, which is equivalent to the correlation coefficient r (see Correlation), is a measure of effect size. Phi is defined by

Phi effect size chi-square

where n = the number of observations. A value of .1 is considered a small effect, .3 a medium effect and .5 a large effect.

This is the effect size measure (labelled as w) that is used in power calculations even for contingency tables that are not 2 × 2 (see Power of Chi-square Tests).

Cramer’s V

Cramer’s V is an extension of the above approach, and is calculated as

Cramer's V effect size

where df* = min(r – 1, c – 1) and r = number of rows and c = number of columns in the contingency table. Per Cohen, you use the guidelines for phi divided by the square root of df*. Thus, the guidelines are:

df* small medium large
1 .10 .30 .50
2 .07 .21 .35
3 .06 .17 .29
4 .05 .15 .25
5 .04 .13 .22

Figure 1 – Effect sizes for Cramer’s V

As we saw in Figure 4 of Independence Testing, Cramer’s V for Example 1 of Independence Testing is .21 (with df* = 2), which should be viewed as a medium effect.

Odds Ratio

For a 2 × 2 contingency table, we can also define the odds ratio measure of effect size as in the following example.

Example 1: Calculate the odds ratio for the data in Example 2 of Independence Testing.

Odds ratio chi-square

Figure 2 – Odds ratio effect size

As we saw in Example 2 of Independence Testing, there is a significant difference between those taking therapy 1 and those taking therapy 2. In fact, 26.19% of the people who took therapy 1 were not cured, while 47.22% of those who took therapy 2 were not cured. This shows that those taking therapy 2 were 1.80 times as likely as those taking therapy 1 to remain uncured. This is a meaningful measure of effect size, called the risk ratio or relative risk.

A related measure of effect size is the odds ratio. The odds of a person who took therapy 1 remaining uncured is 11 to 31 or .3548. The odds of a person who took therapy 2 is 51 to 57 or .8947. This means that the odds of remaining uncured is 2.52 times greater for therapy 2 than therapy 1. The ratio 2.52 is the odds ratio.

68 Responses to Effect Size for Chi-square Test

  1. David Reilly says:

    Hi Charles,

    Appreciate the post and your simplification of the content in Cohen.

    Something bothers me about the rules of thumb for Phi and Cramer’s V in that there is no new information in the guidelines for Cramer’s V. So why not just use Phi for any size table and the 0.1, 0.3, 0.5 interpretation?

    Am I missing something here?

    • David Reilly says:

      To clarify what I mean by “no new information”:

      Cramer’s V = Phi / sqrt(df*)

      Rule-of-thumb ROT Cramer’s V = ROT Phi / sqrt(df*)

      • Charles says:

        David,
        The new information is the fact that the rule of thumb for Cramer’s V handles more than just 2 x 2 contingency tables.
        Charles

    • Charles says:

      David,
      The rule for Cramer’s V is an extension of the rule for phi. In fact the rule for Cramer’s V is the same as for phi in the case of a 2 x 2 table.
      Charles

      • David Reilly says:

        Thanks!

      • David Reilly says:

        Thanks again for taking the time to answer my query. Sorry to come back to this, but now I realize what is bothering me.

        Isn’t Cohen defeating the purpose of Cramer’s V by adjusting his Rules-of-thumb for df*?

        By way of analogy, it strikes me that this is kind of like doing a Bonferroni adjustment to a p-value, and then doing the same adjustment to your alpha value, thus cancelling out the benefit of Bonferroni.

        • Charles says:

          David,
          Sorry, but I don’t see how your analogy applies in this situation.
          Charles

          • David Reilly says:

            Ok, please skip the analogy.

            What I am saying is that Cohen’s ROT “Small” = 0.1; “Medium” = 0.3; “Large” = 0.5 should be applied to Cramer’s V without modification.

  2. Olivera says:

    Hi Charles,

    I would much appreciate it, if you could answer e this question. I am doing goodness-of-fit test in SPSS and it’s only related to one nominal variable – I want to see whether two distributions are statistically different or not. The test shows p-value > 0.05 which means that the distributions are not statistically different. However, I need to calculate power and effect size of the chi-square test. Can you explain me how I can do this? I was thinking to calculate power in RStudio, but I need effect size and that is my problem – I don’t know how to get this value.
    Thank you in advance!

  3. CL says:

    Hi Charles,

    Could you explain why Cramer’s V = 0 means there is no association at all? I am confused. When all of expected frequencies match observed frequencies, X^2 equals zero and so does Cramer’s V, right? In this case, is it considered there is “perfect association”?

    CL

  4. I’m studing the number of male anf female authors in a scientific field. Data are 35776 men and 17575 women. Chi-square test between real data and expected data according with the null hypothesis (no gender difference: 26675.5 men; 26675.5 women) obviously is clearly significant p<.000001.
    My question is: it makes sense to calculate effect size when only a variable (gender) is at stake?.
    Thanks in advance.

    • Charles says:

      Julio,
      Even when there is only one variable, it makes sense to look at the effect size. Especially when you have a very large sample, you might detect a significant effect, but this doesn’t mean that the effect is very big. The effect size measurement helps you determine whether or not the effect is small or large or somewhere in between.
      Charles

  5. Jonathan Bechtel says:

    Hi Charles,

    I apologize if this is a bit obvious, but what does the X2 refer to in Cramer’s V?

    Thank you

  6. Megan says:

    Hi Charles,
    I was wondering if you are able to comment on the following scenario. If you are using Fisher’s exact test for cases were cells are n < 5 (despite having an overall large sample size, n= 1000), would it not make sense to obtain Cramer's Phi or V since these values depend on the chi-square value (which may be an inaccurate estimation due to the small sample size)? Thank you!

    • Charles says:

      Megan,
      Yes, I would use the same measure for effect size.
      Charles

    • Megan says:

      Thanks Charles for your reply! So to be clear, since Fisher’s exact test only provides a p value, you would not provide the effect size (in Stata, it is possible to obtain fisher’s exact test with the effect size e.g., tab var1 var2, exact V, which makes things a bit confusing!).

  7. Hector says:

    Can a Cramer’s V coefficient be larger than 1?

  8. David says:

    I note that in the table for Cramer’s V you show lower values
    for V as being a medium or large effect when the DF = 3 rather than
    2 or 1.
    I recently did a chi test on 2 samples and 31 variables with 30DF.
    With X2 at 259.734 the p value was extremely low, but V came in at
    0.13, and I calculated w also as 0.13. Yesterday I thought these rather
    low values, but I now wonder if, with 30DF, the effect value might be
    considered larger than I thought at first. Is this right? Does the
    rating of V and w rise with yet more DF than 3 ?

  9. Kelvin says:

    Hi Charles,

    What is the best (most simple, robust) test statistic to measure the correlation between multiple (>2) binomial variables?

    And how does the sample distribution for this test statistic depend on the number of variables k, frequency per variable f, and sample size n, under the null hapothesis (no correlation)?

  10. Stacey says:

    Dear Charles,

    I am doing a meta-analysis among 10 groups using the chi square test. Statistical significance is tested (p<0.05). However, effect size is very small as Cramer's V equals 0.024. I am aware that statistical meaningfulness may come up because of the large sample size. So I am wondering how to correctly interpret this result. Can I say that two categorical variables barely have no relation to each other? Thanks a lot.

    Best,
    Stacey

    • Charles says:

      Stacey,
      It sounds like you understand the situation very well. I would simply report that there is a significant result (probably due to a large sample), although the effect is very small, i.e. the association between the variables is very small.
      Charles

      • Stacey says:

        Thank you Charles. That’s a fair report. Now I have a better idea of how to report this result. Thanks a lot!

  11. Dominic says:

    please I want to ask, phi, Cramers and correlation coefficient are they the same?

  12. muhammed Bello Abdulkadir says:

    i want prove that Chi square test is greater than the G test with CONSTANT (i.e X2=G+k) please someone should help via Muhammedbello31@yahoo.com.
    Thanks

    • Charles says:

      Muhammed,
      I believe that the G test is generally more accurate, although usually the results are not very different.
      Charles

  13. Pingback: Chi-square (measure of association) | Psy Research Methods

  14. Alannah says:

    Hello Charles,

    I want to examine the relationship between two categorical variables (2×2 table). From my reading I know the the Chi-Squared test will measure if there is an association, while the effect size Phi will show the strength of this association. But I have also read that Phi can be used as a correlation coefficient. What is the difference in Phi as an effect size and Phi as a correlation coefficient? They seem to be the same statistic but reported differently. I’m tying myself into knots trying to decide which one to report – chi-square with phi as an effect size or phi correlation coefficient – it seems rather tedious to report (and read) multiple chi-square tests to measure the one categorical variable’s relationship with multiple other categorical variables. Any suggestions would be appreciated. Thank you!

    • Charles says:

      Alannah,

      If you are conducting a chi-square test of independence, then just report phi as an effect size measurement.

      Regarding measuring “one categorical variable’s relationship with multiple other categorical variables”, I would need to see more details about the situation before commenting further. One technique that may be useful is log-linear regression as described on the webpage
      Log-linear regression.

      Charles

  15. Janaína says:

    Dear Charles,
    If I want to see if 2 nominal categories are related and how much, I understood that I have to use the chi-square test. But, I am finding the problem of expected under 5 in some cells (the tables are larger than 2×2). My question is: I only can reed the value of Cramers’V if I could have calculate the chi-square properly?

    • Charles says:

      Dear Janaina,
      If the values are not too large, you should use Fisher’s Exact Test. You can use the odds ratio as an estimate of effect size.
      Charles

  16. DTD says:

    How do I calculate effect size for the McNemar test? Is Cramer’s V still appropriate?

    • Charles says:

      It is best to use the odds ratio.

      According to the following website, you can covert this to a Cohen’s d via the formula LN(OR)/1.81.

      Chinn S: A simple method for converting an odds ration to effect size for use in meta-analysis. Statistics in Medicine 2000, 19:3127-3131)
      http://www.aliquote.org/pub/odds_meta.pdf

      Charles

  17. Luigi says:

    Dear Sir,

    thank you for this nice post! Very helpful.
    I have a question regarding applying the Cramer’s phi to a Chi-square Goodness of fit test (I saw in a previous reply that you already suggested a solution, but I would like to be sure that it can also be applied to my case).
    I’m investigating whether the number of individuals possessing a specific allele at a given loci match a theoretical distribution. I therefore have something like this:
    Allele Observed Expected Probabilities
    A 1 0.05312500
    B 9 0.12500000
    C 17 0.40657437
    D 0 0.00208375
    E 19 0.28958125
    F 0 0.00208375
    G 4 0.01681000
    H 0 0.10265813
    I 0 0.00208375

    I have N=60 and 9 alleles. I know that probabilities are very low and X-square might be innacurate, but I just would like to know whether a Cramer phi could be applied.
    So…if I run a chi-square test for given probabilities on this data set I get:
    X2=178.0666
    p.value=<2.20E-16
    df=8

    Is the Cramer phi calculated as the square-root of (178.0666/60) ? If so, in that case I get a value of 1.722724, which is outside of the range for the phi. Is it because the size of the table is too large? Or am I doing something wrong?
    In case Cramer phi is not applicable in my case, I would appreciate if you could suggest an effect size equivalent for this data. I even tried Pearson correlation (correlating the vector of observed and a vector of expected frequencies) to have a measure of association, but I'm quite sure that this is not the right approach to follow.
    Thanks in advance!

    • Charles says:

      Dear Luigi,

      Before we look at Cramer’s phi, let’s review the calculation of the chi-square statistic. This statistic is calculated using the observed and expected values — not the expected probabilities. Since there are 50 observations, N = 50 (and not 60 as you have written). Thus, to get the expected values you need to multiply each of the probabilities by 50. This will produce the following table:

      Allele Obs Exp
      A 1 2.65625
      B 9 6.25
      C 17 20.3287185
      D 0 0.1041875
      E 19 14.4790625
      F 0 0.1041875
      G 4 0.8405
      H 0 5.1329065
      I 0 0.1041875
      50

      From this I calculate a chi-square value of 9.66. One big problem, however, many of the expected values are less than 5, which violates an assumption for using the chi-square test.

      Charles

      • Luigi says:

        Dear Charles,

        Thanks for your reply. I actually made a mistake in copying/pasting the data. In fact, allele “G” was not 4 but 14 (hence, N=60). Beside this mistake, I agree that there are too many expected values less than 5 and, thus, I cannot apply a Chi-square test. Do you know any alternative that I could use in my specific case? I would prefer to retain the low expected values, because in may case they agree with the “0” counts that I have at the observed values and are indeed a sign of “good fitting”

        • Charles says:

          If you cannot combine categories so that the expected values are at least 5, then you could use an exact test. How this works for a 2 x 2 contingency table is explained on the webpage Fisher Exact Test.

          In your case, you don’t have a 2 x 2 contingency table, but the approach is similar, except that you probably need to test using the multinomial distribution (instead of the binomial distribution).

          Charles

          • Luigi says:

            Thanks a lot for your quick reply! I have already tried the exact multinomial test, but I couldn’t find a way to calculate an effect size similar to a Cramer V. Do you know if there is a way to calculate it?

          • Charles says:

            Sorry Luigi, I don’t know how to calculate an effect size in this case.
            Charles

  18. Serge says:

    Dear Sir,
    As a biologist, I am testing the influence of genetic component in caste determination in a desert ant. In my study species, colonies contain small workers and large workers, and I am testing whether the fate of developing larvae (into a small or a large adult worker) has a genetic component. Colonies are headed by a single mother, that is multiple mated. I have genotyped small and large workers from several colonies, determined their patriline, and compared patriline representation between the two castes using G-tests for heterogeneity (e.g., each colony has two columns (small and large workers) and x rows, with x corresponding to the number of queen’s mates, i.e. from 6 to 13). My results reveal a genetic effect in some colonies, but not in others.
    I would like to perform a power analysis of the G-tests. I have used G*Power (3.1.9) to compute effect sizes and power tests for each colony. But G*Power gives non credible results.
    Thus, I have two questions:
    1. Are Cohens’w or Cramers’V suitable for testing effect size on G-tests for heterogeneity?
    2. When computing Cohens’w, I obtain values of effect size > 8 and power = 1. This sounds also weird.
    Maybe do you have some suggestions to help me deciphering this puzzle?
    Warm thanks in advance for your reply.
    Serge

  19. Hana says:

    Hello,
    another question from me 🙂 If we have a Cramer’s V of 0.2, would it be small or moderate?
    Thank you!

  20. Sue says:

    Dear Charles,
    thank you for this very helpful website. Is it also possible to use Cramer’s V for one-dimensional (Goodness-of-Fit) Chi-square tests, for example if I am testing in a 1 x 4 table whether the cases are evenly distributed across all four categories? If it is possible to calculate V, would i calculate it with df = 3? Or df =1? (since df = 0 would not work?). If Cramer’s V is not the typical approach for this type of Chi-square test, do you have any other suggestion?
    Thank you very much in advance for your input,
    Sue

    • Charles says:

      Sue,
      The effect size used in this case is phi (also called w) as described on the referenced webpage. w = sqrt(Chi-sq/n)
      Charles

      • Markus Stefka says:

        Dear Charles,
        I know this question was long ago, but my questions fits in this topic. At the beginning you mention φ=r. And after that you say that w is used as φ in situations with no 2×2 tables (eg. like in my situation: 1 Group Yes/No).

        Is then w=r also true in this case? Or is this only if i calculate a “normal” 2×2φ?

        Greetings from Austria
        Markus

        • Charles says:

          Markus,
          First you have to tell me what definition you are using for the correlation coefficient r. In the 2×2 case this is clear, but it is not so clear in say the 3 x 3 case.
          Charles

  21. Maryam says:

    could you help me with this?
    I am going to calculate the power for chi-squared test in 2*3 contigency table.
    We expect the conditional distributions to be approximately (.2,.2,.6) and (.3 , .3 , .4).
    with 100 observations for each treatment and p(type I error)=0.05, what is the approximate power to compare the distributions?
    Thanks in advance,
    Mary

    • Charles says:

      Maryam,
      Assuming that n = 100 and the observations are (20,20,60) and (30,30,40), then in a chi-square test of independence chi-square = 8 and so the effect size w = sqrt(8/100) = .282843 and df = (2-1)(3-1) = 2. Using the Real Statistics’s Power and Sample Size data analysis tool, the approximate power is 71.76% (provided I didn’t make a mistake along the way).
      Charles

  22. Aron Fabian says:

    Cheers, this was super helpful, saved a bunch of us after our education system failed to tell us what all the letters actually meant!

  23. Joseph says:

    A few questions.

    On Cramer’s V….what about df = 4?

    And, what is the citation for Cohen there?

    • Charles says:

      I don’t have any information regarding interpretation of Cramer’s V for values of df* > 3, The citation is given on the referenced webpage, namely [Lo/GW]. You can find these listed in the Bibliography webpage.

      Charles

    • Charles says:

      Joseph,
      I now have a better answer for you. I have just updated the webpage with this information.
      Charles

  24. Kay says:

    This was so helpful thank you!
    I was just wondering if you might distinguish further between the Chi goodness of fit test and the Chi test for independence:
    Is it valid to calculate both Cramer’s V and the Odds Ratio for each of these tests?
    What would the Odds Ratio tell us if applied to a goodness of fit test?

    • Charles says:

      Kay,

      I have not used Cramer’s V with goodness of fit, only independence testing. I have seen the following statement on Wikepedia, but I can’t comment further on it.

      “Cramér’s V may also be applied to goodness of fit chi-squared models when there is a 1×k table (e.g.: r=1). In this case k is taken as the number of optional outcomes and it functions as a measure of tendency towards a single outcome.”

      Regarding the odds ratio, I don’t know how this could be done since you would need a 2 x 2 comparison.

      Charles

Leave a Reply

Your email address will not be published. Required fields are marked *