**McNemar’s Test** is a matched pair test used to determine whether there is a significant change in nominal data before and after an event. We begin with an example.

**Example 1**: In the BBC program *The Doha Debates* 100 people were surveyed regarding their opinion about capital punishment. 30 were in favor and 70 against. They then listened to a debate about the subject and the survey was repeated. This time 35 voted in favor and 65 against. 11 changed their mind from against to in favor and 6 changed their mind from in favor to against. Did the debate affect people’s opinion?

Let *A* = number of people who switched from in favor to against = 6

Let *B* = number of people who switched from against to in favor = 11

If the null hypothesis that the debate had no affect were true then *A = B*. the test statistic *A* – *B* has chi-square distribution with 1 degree of freedom. To allow for Yate’s continuity correction, we simply use

But the critical value of the chi-square distribution is CHIINV(.05, 1) = 3.84. Since 1.19 < 3.84 we can’t reject the null hypothesis, which indicates that we can’t say whether the debate affected the outcome.

**Observation**: If *A + B* < 25, as in Example 1, then the McNemar’s test shouldn’t be used. Instead a one-tailed binomial test of the smaller of *A* and *B* should be performed with *p* = .5 and *n* = *A + B*. In the case of Example 1, the probability of getting 6 or fewer successes out of 17 is given by

BINOMDIST(6,.5,17,TRUE) = 0.166 > .05 = *α*

Thus we cannot reject the null hypothesis, and so conclude the debate did not significantly affect the outcome. The binomial test is equivalent to the sign test.

**Example 2**: In another installment of *The Doha Debates,* 1000 people were surveyed and 705 were in favor and 295 against. After they listened to the debate 663 voted in favor and 337 against. 73 changed their mind from against to in favor and 115 changed their mind from in favor to against. Did the debate affect people’s opinion?

In Figure 1 we rerun the tests with *A* = 115 and *B* = 73, and see that this time the debate had a significant impact on the outcome.

**Figure 1 – McNemar’s Test**

**Observation**: McNemar’s Test can be used with paired samples where the dependent variable is dichotomous. Where there are more than two samples (groups) Cochran’s Q test can be used. See Cochran’s Q Test for more information as well as for Real Statistics functions and a data analysis tool that can also be used to support McNemar’s Test.

Hi Dr. Zaiontz,

Imagine you are running a cancer clinic. Some people you see will have a symptom (say a lump). When you evaluate some symptoms you find a cancer. Most people you see do not have a symptom and they undergo screening to determine if they have a cancer. Screening can be a mammogram, an ultrasound, or both depending on individual circumstances.

Each cancer has a stage at diagnosis. Stages range from 0 to 4. With higher numbered stage, prognosis is worse, but the stages are not linear. In other words, although a stage 4 cancer is worse than a stage 2 cancer, it is not twice as bad as a stage 2 cancer.

Screening is performed to decrease stage at diagnosis. Most people who you see with a symptom that indicates a cancer are at least stage 2. Screen detected cancers are usually stage 1. Stage 1 is much better. Stage 2 and above are advanced cancers. With screening you usually find cancers in somewhere between 0.4%-1.5% of people screened.

You see about 6,000 people in your clinic. You diagnose about 150 cancers. You have recorded what imaging people had done that resulted in the diagnosis. For each diagnostic study you know if the cancer was seen.

You calculate the proportion of cancers diagnosed that were advanced (stage>1) for each method of diagnosis, symptomatic, mammogram, ultrasound, or mammogram+ultrasound.

You find that the proportion of advanced cancers when compared to the proportion of symptomatic cancers is as follows: mammogram > mammogram+ultrasound. Ultrasound alone is roughly the same as mammogram. This Implies that the combination of mammogram+ultrasound is a better screening strategy.

How would you test this hyopothesis? These are paired observations, so I was thinking of using the McNemar test and running multiple comparisons, i.e. symptomatic vs. mammogram, symptomatic vs. ultrasound, etc., but the multiple comparisons are concerning. I would appreciate any thoughts you might have…. Thank you!

Jason,

I understand that you are trying to test “whether the combination of mammogram+ultrasound is a better screening strategy than ultrasound alone.” Is this correct?

If so, first you need to define what it means to be “a better screening strategy”. Does this mean significantly fewer deaths? or lives longer? or something else? What if you haven’t run the experiment long enough to know which category a person fits into?

Why do you say these are paired observations? A person can’t be both in the category of mammogram+ultrasound and the category of ultrasound only.

Charles

Hi Dr. Zaiontz,

Thanks for your reply. Currently, we see people who come in with a symptom that is due to a cancer. Everybody without a symptom gets a mammogram for screening. Many, but not all, asymptomatic people also get an ultrasound if they have “increased breast density,” a mammographic finding known to decrease the sensitivity of mammography.

Therefore, some people have a cancer detected because they have a symptom. Of the people with cancer who don’t have a symptom, some are detected on mammogram, some on ultrasound alone, and some on both studies. We know how each cancer we have found was detected. We know what stage at diagnosis each cancer was.

So based on this data, we can hypothesize three strategies: wait for the symptomatic cancer (no screening), screen asymptomatic people with a mammogram, and screen asymptomatic people with a mammogram+ultrasound.

Of people with cancer who have a symptom, x percent would have an advanced cancer. If we were seeing people with symptomatic cancers, but also screening asymptomatic people with mammography alone, y percent would have advanced cancers. And if we were seeing people with symptomatic cancers, but also screening asymptomatic people with both mammography and ultrasound, we would have z percent would have advanced cancers. We hypothesize that x>y>z. The best screening strategy is the strategy that has the lowest percentage of advanced cancers. The question is, how would we test this hypothesis.

Thank you, we appreciate your help… Jason

Jason,

Let me makes sure that I understand what x, y and z are. Are you saying that the data shows that (1) x% of patients with a symptom have advanced cancer, (2) y% of patients with a symptom or who are screened with mammogram only have advanced cancer and (3) z% of patients with a symptom or who are screened with both mammogram and ultrasound have advanced cancer? In (2) I assume that you are including patents who are screened even if the mammogram is negative, is that correct? (and similarly for (3)). What about patients who are screened and show signs of cancer but not advanced cancer?

I am not sure why you are including patients with a symptom in all three categories. Why not keep things simple and compare (1) x% of patients with a symptom have advanced cancer, (2) y% of asymptomatic patients who are screened with a mammogram only have advanced cancer and (3) z% of asymptomatic patients who are screened with both mammogram and ultrasound have advanced cancer

Hi Dr. Zaiontz,

We thought about just comparing the screening patients and excluding the symptomatic patients, but we decided to include them because we want to perform an “intent to treat” analysis. No method of screening is perfect and there will always be people who come in with symptomatic cancers. Some because they are missed on screening and some who just don’t get screened. We want to be conservative and not overstate the benefits of screening.

We envisioned the denominator for x,y,and z as being the total number of cancers diagnosed in each category, not the number of people screened. For example, y represents the percentage of advanced cancers diagnosed out of all the cancers you would diagnose if you were seeing symptomatic cancers and doing screening with mammogram alone on asymptomatic cancers.

We appreciate your help with this… Jason

Jason,

The problem with this approach, from the point of view of the statistical analysis, is that the set of patients in the three groups that yield the x, y and z percentages overlap. If I call these three groups X, Y and Z, then if I remember correctly X is a subset of Y and Y is a subset of Z. Thus we don’t have independence when we compare these groups. If instead I don’t include the members in X in Y and I don’t include the members in X or Y in Z, then I can use all the usual statistical analyses. If these groups have significantly different means, then although I can’t automatically conclude that the groups with overlap have significantly different means, probably I can make such assertions in many real-world situations.

These are some of my initial thoughts (off the top of my head).

Charles

Sir i am not able to use Mc Nemars test from your software \please guide

Puneet,

The test is called the Cochran’s Q Test in the software and can be found as one of the options of the Non-parametric Test data analysis tool.

Charles

This is very close to what I need; however, my null-hypothesis is that the results are different. In your examples this would correspond to opinions changing; therefore, rejecting the null hypothesis would amount to statistically concluding that the opinions didn’t change (contrary to expectation).

What test should be used for this opposite null hypothesis with binary ‘yes’ or ‘no’ type data?

Dane,

You should be able to use the usual test, but a significant result result would support the alternative hypothesis. You should also look at the power of the test.

Charles

Dear Charles,

I also am struggling to decide what the best statistical test would be for my data. I have a set of students who listened to my lecture. I did a pre-test and post-test multiple and true-false questions. I asked the same questions before and after the lecture. I recorded Correct answers with 0 and Incorrect answers with 1. I want to see whether my lecture had an impact on their understanding. Each student answered the pre- and the post-test questions.

Diana,

One approach is to assign a score to each questionnaire (based on all the questions asked). Then perform a paired t test.

Charles

Thank you so much!

Dr Pity please excuse me that Excel is likely taking to the contrary, please excuse me error

CHISQ_INV 3.841458821 = (0.95, 1)

Dr Hello, I wish you a Merry Christmas and a lot of prosperity for the coming times. Dr’m very grateful to their shared knowledge and your page is like carrying the R software Excel, and put it in simple and understandable terms. Dr in Example 1 McNemar think is a mistake because CHIINV (05, 1) = 0.0039 making significant changes of opinion arise. Please if I am wrong correct me.

Thank you.

Dear Charles,

I am struggling to decide what the best statistical test would be for my data. I have a set of patients who have all had two imaging tests – computed tomography (CT) and MRI. The data from CT is ordinal as I have completed a radiological score (1 to 4). The data from MRI is categorical – present vs. not present. I would like to test the null hypothesis there is no relationship between CT scores and MRI presence/non-presence.

I have concluded that data must be paired as it is from the same patient and both CT and MRI were completed at the same time. I think a McNemar’s should be suitable but with four possible CT scores would Cochran’s Q test be better?

I am really grateful for your help.

Cochran’s Q test requires that all the data be dichotomous. In your example this is not the case since you use scores of 1, 2, 3, 4.

It seems to me that the appropriate way to look at the problem as a two sample t test. Sample 1 consists of all the CT values where the corresponding MRI value is present and sample 2 consists of all the CT values where the corresponding MRI value is not present. Since it is likely that the normality assumption for the t test is not met, you probably need to use the Mann-Whitney test instead.

Charles

Hi Charles,

I’m now start learning non-parametric test from your website:)

One thing confused me is “the test statistic A – B has chi-square distribution with 1 degree of freedom.” , why is “one degree of freedom”?

Do appreciated if you give some hints:)

Wen

Wen,

Although I don’t have a precise answer for you, I have two hints:

1. It is probably related to the fact that if x has a standard normal distribution then x^2 has a chi-square distribution with 1 degree of freedom and for large enough samples the binomial distribution can be approximated by the normal distribution.

2. McNemar’s test is a simple form of Cochran’s test

Charles

I want to determine if there is a relationship between “having had cholesterol measured (yes/no)” and “whether the person is a registered pharmacist (yes/no)”, is this a paired or unpaired data? would McNemar test be appropriated?

Thank you.

If I understand the question correctly, it seems like a 2 x 2 chi-square test of independence.

Perhaps it would help if you explain what the data looks like.

Charles

Could you clarify this for me please? You say that if A + B < 25, then you should use the binomial distribution to calculate the p-value. However, in Example 2, A + B is larger than 25 (i.e., 117 + 73) but you still use the binomial distribution. Should this be the chi-sq distribution? I understand that the resulting p-values will be very similar in this case, but was wondering if I missed something. Thanks.

Andy,

In Example 2, I showed both the McNemar test (based on chi-square) as well as the binomial test. Since A + B >= 25, you should use the McNemar test and ignore the binomial test.

Charles

How were a and b calculated?

A and B are not calculated. They are measured (i.e. they are the input values). Note that in Example 1 B-A = 11-6 = 5 which is the difference between the number voting in favor before and after, i.e 35-30 = 5.

Charles