Paired Sample t Test

In paired sample hypothesis testing, a sample from the population is chosen and two measurements for each element in the sample are taken. Each set of measurements is considered a sample. Unlike the hypothesis testing studied so far, the two samples are not independent of one another. Paired samples are also called matched samples or repeated measures.

For example, if you want to determine whether drinking a glass of wine or drinking a glass of beer has the same or different impact on memory, one approach is to take a sample of say 40 people, and have half of them drink a glass of wine and the other half drink a glass of beer, and then give each of the 40 people a memory test and compare results. This is the approach with independent samples.

Another approach is to take a sample of 20 people and have each person drink a glass of wine and take a memory test, and then have the same people drink a glass of beer and again take a memory test; finally we compare the results. This is the approach used with paired samples.

The advantage of this second approach is the sample can be smaller. Also since the sampled subjects are the same for beer and wine there is less chance that some external factor (confounding variable) will influence the result. The problem with this approach is that it is possible that the results of the second memory test will be lower simply because the person has imbibed more alcohol. This can be corrected by sufficiently separating the tests, e.g. by conducting the test with beer a day after the test with wine.

It is also possible that the order in which people take the tests influences the result (e.g. the subjects learn something on the first test that helps them on the second test, or perhaps taking the test the second time introduces a degree of boredom that lowers the score). One way to address these order effects is to have half the people drink wine on day 1 and beer on day 2, while for the other half the order is reversed (called counterbalancing).

The following table summarizes the advantages of paired samples versus independent samples:

Paired Samples Independent Samples
Need fewer participants Fewer problems with fatigue or practice effects
Greater control over confounding variables Participants are less likely to figure out the purpose of the study

Figure 1 – Comparison of independent and paired samples

Obviously not all experiments can use the paired sample design. E.g. if you are testing differences between men and women, then independent samples will be necessary.

As you will see from the next example, the analysis of paired samples is made by looking at the difference between the two measurements. As a result, this case uses the same techniques as for the one sample case, although a type 1 TTEST or the paired sample data analysis tool can also be used.

Example 1: A clinic provides a program to help their clients lose weight and asks a consumer agency to investigate the effectiveness of the program. The agency takes a sample of 15 people, weighing each person in the sample before the program begins and 3 months later to produce the table in Figure 2. Determine whether the program is effective.

Paired sample data Excel

Figure 2 – Data for paired sample example

Let x = the difference in weight 3 months after the program starts. The null hypothesis is:

H0: μ = 0; i.e. any differences in weight is due to chance

We can make the following calculations using the difference column D:

s.e. = std dev /\sqrt{n} = 6.33 /\sqrt{15} = 1.6343534

tobs = ( – μ) /s.e. = (10.93 – 0) /1.63 = 6.6896995

tcrit = TINV(α, df) = TINV(.05, 14) = 2.1447867

Since tobs > tcrit we reject the null hypothesis and conclude with 95% confidence that the difference in weight before and after the program is not due solely to chance.

Alternatively we can use a type 1 TTEST to perform the analysis as follows:

p-value = TTEST(B4:B18, C4:C18, 2, 1) = 1.028E-05 < .05 = α

and so once again we reject the null hypothesis.

As usual, for the results to be valid, we need to make sure that the assumptions for the t-test hold, namely that the difference measures are normally distributed or at least reasonably symmetric. From Figure 3 we see that this is the case:

Box plot difference measures

Figure 3 – Box Plot for difference measures (column D of Figure 2)

We can also use either Excel’s t-Test: Paired Two Sample for Means data analysis tool or the T Test and Non-parametric Equivalents supplemental data analysis tool to get the same result. The output from the Excel data analysis tool is shown in Figure 4.

Paired samples Excel tool

Figure 4 – Excel data analysis for paired samples

To use the data analysis version found in the Real Statistics Resource Pack, enter Ctrl-m and select T Tests and Non-parametric Equivalents from the menu. A dialog box will appear (as in Figure 3 of Two Sample t Test: Unequal Variances). Enter the input range B3:C18 and choose the Column headings included with the data, Paired Samples and T Test options and press the OK button. The output is shown in Figure 5.

Paired t test Excel

Figure 5 – Real Statistics data analysis for paired samples

We have seen all the items in the above table before with the exception of the Pearson Correlation. This is explored in Correlation.

Observation: Suppose we run the same analysis for the data in Example 1 from Two Sample t Test with Equal Variances using the t-test with independent samples and compare the results with those we obtained for paired samples:

independent samples t test

Figure 6 – Excel data analysis for independent samples

We summarize the results from the two analyses as follows:

Paired independent t tests

Figure 7 – Comparison of paired and independent sample t tests

Note that the mean differences are the same, but the standard deviation for the paired sample case is lower, which results in a higher t-stat and a lower p-value. This is generally true.

Observation: Although we have provided a supplemental data analysis tool for one sample tests, Excel doesn’t provide a standard data analysis tool for this case. The type 1 TTEST and paired samples data analysis tool can, however, be used for the one sample case by simply creating a null paired sample with all zero data.

Example 2: Repeat Example 1 of One Sample t-Test using the above observation.

Single sample t test

Figure 8 – Use of paired sample data analysis for one sample test

Observation: Since the two sample paired data case is equivalent to the one sample case, we can use the same approaches for calculating effect size and power as we used in One Sample t Test. In particular, Cohen’s effect size is


216 Responses to Paired Sample t Test

  1. Luiz Fabrizio Stoppiglia says:


  2. Bharath Reddy Vasireddy says:

    This was really nice!!!

  3. Statstudent says:

    I swear you should write a textbook, these are fantastic!

    • Charles says:

      Thanks for your support. I plan to issue a statistics book shortly. Stay tuned.

      • Colin says:

        I agree with them! You should publish a book about statistics and Excel.

        • Charles says:

          Thanks for your support. I plan to publish a couple of new statistics books this year. Stay tuned.

          • Colin says:

            It is great. I am looking forward to your books. I hope you can publish your books in kindle version. It will be more convenient for people who live outside the US to buy the e-version.

  4. Pavlos Malindretos says:

    great really!!!
    Sir, please if you could be so kind as to help with one question that troubles me. You said that paired samples can be protected easier from cofouding variables. What if (in my case), I cant get reed of a cofouder, but I would like to perform some kind of correction to the paired samples t-test. To be more specific: in dialysis patients as days from dialysis session pass, their blood pressure rises and their arterial stifness seems to rise also. How could I correct my findings regarding arterial stifness, taking into account that blood pressure might be a cofunder and correct for that? Is it posible to perform that with SPSS? or Excel? is it possible anyway?

    Thank you for your kindness
    With Respect
    Pavlos Malindretos

    • Charles says:

      I don’t have a precise answer for you, but I will provide some reflections:
      (1) If blood pressure is highly correlated with arterial stiffness, then maybe there is no need to adjust for blood pressure since they are essentially measuring the same thing.
      (2) Assuming that blood pressure is a confounding factor that needs to be corrected for, how you do the correction depends on what you are trying to test (e.g. there is a significant degradation in arterial stiffness as a patient undergoes more days of dialysis).
      (3) There is a technique called Repeated Measures ANCOVA which may be appropriate for your needs. It is supported in SPSS.

  5. bret says:

    I ran both descriptive statistics and t-tests on some data and noticed that the column of the t-test labeled “Std Dev” appears to be the variance not the standard deviation. Is this the case, or am I missing something?

    • Charles says:

      You are correct. I mislabeled the column Std Dev when it should have been labeled Variance. I will issue a new bug fix release shortly to fix this. Thanks for catching this error.

      Update: This change has now been made in Rel 2.16.1

  6. Rans says:

    I could not understand – p-value 1.028E-05 < .05 = α
    What is the meaning of E in it and how it is less than .05?

  7. Jeff says:

    Hello Charles,

    Regarding t-test mean comparisions in general; at 95% confidence, if the two-tailed value had come out 0.05, what would be the conclusion?

    Thank you.

  8. Jeff says:

    Sorry,. Question should have been-
    Regarding t-test mean comparisions in general; at 95% confidence, if the two-tailed value had come out 0.05, what would be the conclusion. Could this happen?

  9. Jeff says:

    comment not capturing second part of question- if p lt 0.05 two tailed and p gt 0.05 one tailed, what is conclusion?

    • Charles says:


      You are supposed to decide on the type of test (one-tail or two-tail) before you collect the data or run the test. If you believe very strongly that the lower or upper tail of the distribution is highly unlikely then you would use a one-tail test. Generally the default (at least for most tests) is a two-tailed test.

      For a symmetric distribution if you get say p-value = .04 for a one-tailed test, then the p-value of the two-tailed test will be .08. It is certainly possible to have a significant result for a one-tailed test (i.e. you reject the null hypothesis), while a non-significant result for a two-tailed test (i.e. you cannot reject the null hypothesis). The reverse is not true (since the p-value for a two-tailed test greater than or equal to the p-value of the one-tailed test).

      I would say that if you haven’t decided prior to running the test whether you are using a one-tailed or a two-tailed test, then you should use the result of the two-tailed test (unless it is clear that you should have seen that one tail of the distribution was highly unlikely or impossible).


  10. environmental engineer says:

    nice data very professional in your presentation

  11. khaula uzair says:

    i need your help for my research data analysis. i applied paired sample t-test for pre-test post test comparison. mean differences r very high i.e n=8 pre-test mean =41.25 post test mean= 71.25
    mean=-30 , t -value is= -13.09, df =7 sig (2 tailed )=0.000 at .05
    for another test
    mean =-17.750, t= -20.139, df=7, sig =0.00 at .05 level
    plz guide me what to do and how to defend my work

    • Charles says:

      When you say that for another test you get a different result. Is this for the same data using a different test? If so which test did you use for this other test? If it is for different data, then I don’t understand your question; please provide more information.

  12. Toby says:

    Is this technique suitable for determining whether a simulation of a ball trajectory is ‘close enough’ to a measured trajectory?

    Rather than columns A, B and C being “Person”, “Before” and “After” could they be replaced with “Time”, “Height (test)” and “Height (simulation)”? In your example the weight of each person is independent of the person before, but in my example the time is dependent upon the one before. I would really apprecaiate any advice!

  13. Toby says:

    Thank you Charles. Great site!

  14. Mandy says:

    Hi Charles, great tutorial!
    I am used to using SPSS so Excel is a bit of a change. I just ran a paired samples t-test with a large-ish sample size (1403) following your directions and it’s saying the p one-tail is P(T<=t) one-tail 1.77E-07 I'm not sure how the p value can be greater than 1 or how to interpret this. Any suggestions would be helpful and I apologize if this is a silly question.


    • Charles says:

      1.77E-07 = 0.000000177, which is a very small positive number less than 1.

      • Mandy says:

        Hi Charles,

        Thanks! But how did you calculate that? Does the E-07 mean to move the decimal over by 7? Sorry for all of the questions 🙂

        • Charles says:

          Yes. 1.4E-07 is the same as 1.4 times 10 raised to the -7 power (which means move the decimal over 7 places). E stands for exponent.

  15. Lana says:

    Hi, I’m doing a lab in order to see the effect of hand dominance on reaction time. I tested 15 people, each of them doing 5 trials. If I average out the reaction time trials for the dominant hand and non-dominant for each person, would I use a paired sample t-test to compare the means(for example, my average reaction time for my right hand is 0.553 s and 0.883 s for left hand, to compare these two values)? Or would a independent t-test be more suitable?

    • Charles says:

      If you are testing the right and left hands of the same people, then you should use the paired sample t-test.

      • Matt says:

        I’m looking at pretty much the same situation.
        Looking at the data of dominant versus non dominant hand from the same people.
        Seeing as it isn’t a before and after test and just a which hand is faster test wouldn’t it be independent?
        Also if the data is incredibly skewed will that change which test I would use?

        • Matt says:

          Just one other thing I know I need a one tail test but I’m still trying to figure out if it is upper or lower.

          • Charles says:

            If the mean of the differences between sample pairs is negative then you are typically dealing with a lower tail. If it is positive then you are dealing with an upper tail. The test is whether the population mean is zero.

        • Charles says:

          Since the right hand and left hand are owned by the same person, the pairs of values are not independent, but dependent, and so for this reason a paired test is used. The before and after types of situations are probably the most common paired tests, but others also occur (e.g. husbands vs. their wives).
          If the assumptions for the test are not met you might use the paired Wilcoxon Signed-Ranks test instead of the t test.

          • Matt says:

            Thanks so much for the reply.
            It has been incredibly useful.

            My sample size is 125 and the data is incredibly skewed to the right however the differences of the two samples is approximately normal. Will chance my choice of test?

          • Charles says:

            Since the test is on the differences between the two samples, you should be able to use a paired t test. If you use the Wilcoxon Signed Ranks non-parametric test the results should be similar.

      • Another Matt says:

        Great website!!

        I have a similar situation, seeing if a metric like reaction speed changes with a device ON vs. OFF. We have 37 subjects test ON five times and OFF five times. How would we use the repetitions of the test to increase our statistical power? One ON vs. OFF test one 37 different people would be a paired t-test, but if each person had multiple repeats can we use that somehow, like get a better feel for the st dev of the population? Thanks!

        • Charles says:

          Glad you like the website.

          You could perform a two factor repeated measures ANOVA where both factors are repeated measures (factor A = ON vs. OFF and factor B is trial 1, 2, 3, 4, 5). I don’t know whether this will improve power, but you can try some experiments yourself.


  16. Charles says:

    I love your website and it has been very useful for me. I have recently downloaded the real statistics download that you provide and have a question. I am comparing the blood values of test subjects in a clinical trial such as WBC before and after a drug. The tests subjects are the same. Under the t test and nonparametric equivalents which option would be best: one sample; two paired samples; or the two independent samples? I am looking to see if the difference is significant.

    Sorry if this is a novice question.

    Many Thanks

    • Charles says:

      If you are testing the same subjects before and after then you should use a paired samples test.

      • Charles says:

        Thanks for the quick reply. That is the one that I have been using. Awesome website. Once again Thank you

        BTW – Pretty cool first name.


  17. Charles says:

    Just out of curiosity. I have many different blood values to analyze. I have set up all the worksheets in a workbook exactly the same way. Is it possible to have a t test run on an entire workbook or do I have to do an individual t test for each worksheet?


  18. jhanu says:

    please help me…
    can we use paired t-test for 450 samples also…

  19. Stefano says:


    I would please have a simple question. I need to run a paired t-test between groups of different sample size. Do I need to extend my smaller array to the same size of the largest one? Or do I need to reduce the largest one, excluding some data?

    Thank you in advance,

    • Charles says:

      The first question you need to ask yourself is “why are the samples not equal in size?” For the paired t test both samples are made for the same subjects. E.g. if one sample is sight in right eye and the other sample is sight in the left eye (for the same people), then the only reason why the sample sizes wouldn’t be equal is that you didn’t take both measurements for all people or some data was lost. If this is not the sort of thing that has happened then perhaps you shouldn’t be using a paired test.
      If indeed this is the sort of problem that you are dealing with then the cause of the samples being unequal is that there is missing data. In this case, you have two choices: (1) eliminate any data for which there is no measurement in both samples and (2) impute the value of the missing data element. See Dealing with Missing Data for more information.

  20. SHIRLEY SELF says:

    I want to do pre and post test of anxiety and the effect of exercise on anxiety. Is this a good use of Paired Sample t-test? Or should I be using TWO SAMPLE T –TEST WITH PAIRED SAMPLES?

    S. Self

    • Charles says:

      One of these is the standard Excel data analysis tool and the other is the Real Statistics data analysis tool. They should yield the same results, although the Real Statistics tool provides some additional information.

  21. Brandon says:

    Hi I was wondering what test I should use if I am trying to compare the ages of people and their life expectancy.

    • Charles says:

      It depends on what you mean by compare. Generally this is like comparing “apples” with “oranges”. Perhaps you mean that you want to correlate ages and life expectancy, in which case you probably want to use one of the correlation tests. Or you might want to predict life expectancy based on a person’s age, in which case you would use regression.

  22. Grace says:

    Thank you for this webpage– very helpful!!
    Is there a way to do a repeated measures with 3 time points? I’m comparing means at pretreatment, posttreatment, and 3 months posttreatment.

  23. marry says:

    this was really nice

  24. StatStudent says:

    Very helpful and clear explanation. Thank you.

    I’m trying to interpret a data set with 187 pre and post test scores – the hypothesis being that scores will increase from beginning to the end of the year. The means are 42 and 66 for pre and post, but the paired sample t-test returns a negative value for the t-stat (-23.3591). What does this mean?

    • StatStudent says:

      I think I can see what that negative value means – it’s the difference between pre and post means. So instead of the difference being zero which is the null hypothesis, and the negative number indicates that the post test score incraesed.

      So I guess I answered my question.

  25. Wonderer says:

    Dear Charles,
    I have a question.
    What should I do for a case where two different numbers of participants (i.e. two subgroups with different Ns) from the same group took two tests. So, let’s say I have Group T (N=30), but 23 and 20 out of Group T took two different tests (Test A and Test B), and I want to test the difference between the scores from Test A (N=23) and Test B (N=20). I think I should do a (independent) two-sample t-test for this case. But I also want to make sure whether there is any way to use paired-t test for it (because Test A and Test B were actually from the same Group T). Thanks for your comments in advance,
    Thanks for your

    • Charles says:

      As long as the A and B groups were selected randomly from group T you can use the two independent samples t test. But no one in group A can also be in group B. If you have overlap between the two groups then neither t test is appropriate. In fact, I know of no test that would be appropriate.

  26. Henriette says:

    Dear Charles,

    I have a question. What should I do for a case when there are pre- and post test scores measured, but there are missing values for some of the pre- scores and some of the post- scores.
    First I wanted to remove the paires with missing values, but in this case it is difficult to remove them, because there are too little measurements. Is it possible to use a two-sample t-test in this case, whithout removing any data. Or is there another test which I can use?


  27. Yoga Salim says:

    Dear Sir,

    I have a question. If the Box-Plot is not normally distributed (severely one-sided based on BoxPlot), does it mean that paired t-test cannot be used / invalid (assumption for paired t-test: normal distribution)? What about Wilcoxon signed rank test? I tried Wilcoxon signed rank test and found that p > 0.2. This is very similar to paired t-test where p > 0.19.

    Thank you.


    • Charles says:

      These tests are fairly robust to violations of the assumptions, but if the data is severely skewed, then you should consider using the sign test.

    • Charles says:


      The paired t test requires that the paired differences be normally distributed. Two things to note: (1) the two samples don’t need to be normally distributed, just the difference values and (2) the test is fairly robust to violations of this assumption, especially for relatively symmetric data, but if the data is severely skewed then the test is not so robust.

      Wilcoxon signed ranks test also depends on the data being relatively symmetric.

      The sign test does not assume symmetry and so you this might be your best choice, even though its power is not so high (and so you might need a large sample to avoid high type II error).

      You can also investigate using a transformation which will make the data more normal (t test) or at least symmetric (sign-rank test).

      Finally, you should make sure that the data is not severely skewed due to outliers. In this case, you still be able to use Wilcoxon signed ranks test.


  28. biostats says:

    I am trying to understand how to interpret the critical values of t-distribution. When I read about the t-test it says if the value is higher than the critical value then you reject the null, but I have written in notes from a course (and see it written here) if it is lower you reject the null. With practice data I entered into excel, the value is coming out to .09e-6 which I understand the e means it is 10^-6. (with 10 samples). I am either misreading or misunderstanding the interpretation of that value or both?! Any suggestions?

    • Charles says:

      In a two tailed test, there are two critical values, one on the right side of the distribution and another on the left side of the distribution. You reject the null hypothesis if the statistic is less than the left critical value or greater than the right critical value.

      There are two versions of a one tailed test. In one you reject the null hypothesis if the statistic is less than the left critical value. In the other you reject the null hypothesis if the statistic is greater than the right critical value.

      See Hypothesis Testing for more details.


  29. Sam says:

    Thanks for this helpful article.

    In my experiments i have 3 measurements before treatment and 3 after treatment. However these are the same sample measured three times (triplicates).
    Would this paired t-test still work?
    And is the size (3) too small?

    Thank you!

    • Charles says:

      A sample of size 3 is quite small and so the power of such a test would be very low. Also size all the measurements are on the same subject, you don’t have the required independence to even use the paired t test.

  30. aixax uddin says:

    sir, i am working in the privatization and want to find out the mean diffrence between public and private sector banks, should i use paired sample test or two smple test … plz help me

    • Charles says:

      It sounds like an independent samples test. Depending on the details, it could be a t test, ANOVA, MANOVA or some other test.

  31. Si says:

    Hello, I am doing a comparison of two portfolios by comparing the series of sharp-ratios, will this be a independent or dependent t-test?

    • Charles says:

      I am not that familiar with the Sharpe ratio, but a quick look on the Internet indicated that there are a lot of papers about how to compare sharpe ratios using a t test. From what I can tell, the test is neither the classical independent nor dependent t test. In any case, there are many references to how to conduct various tests based on the t distribution.

  32. carley says:

    I recently ran choice tests for birds, to see if they preferred either food that they have waited for or food that they have not waited for.

    Each of the 8 birds had 15 tests, so I have 120 choices in total. I am logging their latencies to choose either side, and my supervisor said to run a paired t-test.

    I have two columns, I am putting the first “wait” latencies in column A of spss and the no-wait in column B of spss. I have 36 values in column A and 84 in column B.

    My question is.. Can I analyse two different column sizes? Most birds went for the second choice of not waiting so the not-waiting column consequently has a higher number of values.. n is 36 in the first output so has it only analysed the first column/the first 36 values of both columns?

    I am horrible at stats and have tried doing this so many times with the same output.. any (simple) help would be so gratefully received… thank you very much!

    • Charles says:

      The columns must have the same size to use the paired t test. I don’t fully understand the situation you are describing, but if say each bird had 15 trials where they waited and 15 trials where they didn’t wait, you could create a score for the wait case and a score for the non-wait case and use these in the paired t test (in this case you would have 8 pairs of scores). Caution: this may not be your situation.

  33. Radhika says:

    Hey Charles!
    Great article! I just want to understand if the one-tailed critical value suggests the variable it’s under is significantly greater or lesser?

  34. Dave says:

    Hello Charles,

    I have a situation that seems to call for a paired t-test, and I was wondering what you’d recommend for a sample size. I want to see if 2 different valuation tools will return a different price for a house. Say I currently use Tool 1, but the assumption is that Tool 2 is more accurate, and thus if Tool 2 returns a significantly different value it would be worth the expense to switch. There are multiple types of houses being evaluated, but the paired t-test would seem to account for that.


  35. Pingback: Statistical analysis on Excel

  36. RC says:

    Hi Charles,

    I had a question with regard to matched sample data analysis. I have gathered matched data on a popilation. Each set of data is normally distributed. When I perform data analysis on the difference between the matched pair, the date set for this (di) is not normally distributed. A cant perform a T-test if this data is not normally distributed right? If I cant, I went ahead and removed the outliers from the data set and when I run the numbers again, the skewness of (di) is under .05, so the data set is normally distributed. Is that the right way of thinking about this?

    • Charles says:

      If x and y are normally distributed, then x – y is also normally distributed, and so I would have expected the differences to be normally distributed. To use a t test, the data should be normally distributed, but the the test is quite robust to violations of normality provided the data is reasonably symmetric. If the assumptions for the t test are not met you can usually use the Wilcoxon signed ranks nonparametric test. This is described in the website.

      You shouldn’t simply remove outliers just to make the assumptions hold. If the outlier is caused by an error or a very exceptional situation, then you might eliminate it. Otherwise you should try to find a suitable replacement test. If this is not possible, then you can eliminate the outlier, but be sure to report that you did this and interpret your results based on the removal of the outlier.


  37. DB says:

    Hi – great site and information -thank you so much for explaining. However, I have a question concerning my project data: I’ve already taken repeated measures of individuals every 30 second, giving 6 data-points per indivudual (1-150 seconds). Now I want to see if there are significant changes between datapoints for the same individual – i.e. if there is a significant change in values from start to end, or between 30sec and 120 sec.
    Do you recomend me to make paired t-tests for samples with unequal (AND equal for checking) variances, or paired t-tests (these are dependent samples- energy penetration is measured over time in human tissue). Should I do this multiple times for each comparison?
    In my book (statistics) they write about ANOVA for two or more samples – I did not find any function like this in Excel…??

    After all, I want to see if there is any significant change of penetration for each individual. Because individual variations in tissue properties, the SD is big for the group, and the values measured differ very much among individuals – thus not of greatest interest here to compare group means above.

    Later, I will compare two different energy sources applied to the same individuals – I then do the same tests as above for each datapoint (i.e. 150 sec)???

    • Charles says:

      Excel does provide various ANOVA data analysis tools, but not a repeated measures ANOVA tool, which is the type that you require. The Real Statistics software does provide this capability along with the follow up tests that you require. See Repeated Measures ANOVA for more information.

  38. Jen says:

    Dear Sir, This was very helpful but I’m still confused as to what the best statistic would be for me to use to determine whether a wellness intervention was successful? I have data from 2012 (preintervention) for 60 people followed by annual data in 2013 and 2014 (i.e. blood pressure, weight, total cholesterol, LDL, HDL, smoking status, and activity level –the last two being categorical). I also have data from a number of other people who chose not to participate in the intervention that I would like to use as a control group. At first, I was thinking a matched pairs t-test…then I was pondering a time-series ANOVA. What would you recommend? How can I use all of this data to figure out whether the intervention was successful? Any help would be very much appreciated. Thank you!

    • Charles says:

      I would need more complete information about the scenario to give you a definitive answer, but you might find that ANOVA with repeated measures or repeated measures MANOVA to be the appropriate test. These are described on the Real Statistics website.

      • Jen says:

        Thank you! I reviewed the repeated measures MANOVA and the ANOVA on your site. Your examples are helpful; however, I’m still trying to figure out which statistic(s) I should use. It’s almost as if I have too many factors to compare. For example, on each person I have the following information:

        2012 2013 2014
        Age 47 48 49
        Sex 1 1 1
        Highest Grade 6 6 6
        Smoking 1 2 3
        Exercise 2 2 2
        Height 68 68 68
        Weight 227 225 227
        BP systolic 120 130 120
        diastolic 78 82 80
        Glucose 307 251 100
        Triglycerides 236 219 206
        Total Cholesterol 154 158 142
        HDL 25 30 33
        LDL 82 84 68
        Overall Health 3 3 4

        Some of the factors I coded such as sex (1=male, 2=female), smoking status, highest grade, exercise habits, and overall health. Would I have to do a repeated measures MANOVA or ANOVA for each factor? I was thinking with the repeated measures ANOVA I could use three factors as you showed. For example, in Excel I tried…

        Pre BP Pre Chol Pre Wt/ Post BP Post Chol Post Weight
        …but then that would only compare two years. So, would I do a 2nd repeated measures ANOVA comparing the next two years…and then pick the next three factors and do the same calculation? If so, how would I make the calculations relate to one another? I truly appreciate your help! Thanks!

        • Charles says:

          I describe single factor MANOVA with repeated measures. It looks like you would need more factors than I currently support.
          You should be careful, however, when using such models since they may be quite difficult to interpret.
          Before you build such complicated models, you should make sure that you are clear about what hypotheses you are trying to test. Are you really trying to show that there is no difference in all these factors from year to year (in which case a multi-factor MANOVA may be necessary)?

  39. Kris says:

    Dear Mr Charles,

    I am currently writing a practical write up and I encounter several problems regarding paired t test and outliers. In the experiment, we are comparing two set of data(heart rate, blood pressure etc.) during (1)constant velocity exercise and (2) interval training in the same group of subjects. i.e. each of them did both exercises.

    When I was trying to use histograms to test the normality of my data, some of them are only normal when I removed the outliers and transformed. I was trying to use parametric and paired tests to test their significance. Should I include the outlying data during significance calculation? If not, would my data consider as not paired because outliers removed are not in pairs?

    Sorry for asking loads of questions but I could not think anymore. Thanks for your time.

    • Charles says:

      Dear Kris,

      It really depends on why these outliers exist. If they are true anomalies, then you can remove them and run the test. If instead they are legitimate data elements that are just higher or lower than the others, then it is best not to remove them. If you remove data, you should remove the pair (not just one of the elements in the pair).

      The paired t test is pretty robust to violations of normality, and so usually I would just run the t test. If the data is quite different from normal, then I would use a non-parametric test — e.g. Wilcoxon signed-ranks test.

      Regarding outliers, when in doubt, run the test both with and without the outliers and report both results.


  40. Yasmin says:

    Hello Charles,

    Thanks. This is of great help. I performed a paired t-test for a pre-post analysis. I got a negative -statistic, -3.57 and statistically significant. What does the negative imply? that the post average was lower than pre?

    • Charles says:

      Yes. If you subtract post from pre you will get 3.57 instead.

      • Yasmin says:

        Charles, I am still confused. my post mean is greater than pre mean for all cases but for some the paired samples t-test is negative and others positive. Why? Does the sign matter or do i just check significance level and ignore the sign, and take it to mean post is greater than pre irrespective of the sign?

        Thanks a million!

  41. jackie says:

    I have 2 questions, why would I get a different t value in SPSS and Excel?
    2- I have a student who did a pre and post test but messed up the ID numbers so I am assuming we can not do a paired ttest so what type of ttest is the best? Thanks

  42. What an awesome web site! I refer to it often because it has so much valuable information. I am working on a research project and I have two questions Sir. What values go into the TINV function to get the tcrit value? I am testing whether there is a difference between the responses (to the same questions) that business students give versus non-business majors. From your examples, it seems this would be a paired samples test. Does this seem reasonable? Thank you in advance for your time.

    • Charles says:


      I am very pleased that you are getting value from my site. This makes me very happy.

      1. The formula =TINV(alpha, df) gives the right critical value for a t distribution with df degrees of freedom at the alpha significance level assuming a two tailed test (critical value at 1-alpha/2). The formula =TINV(1-alpha,df) gives the left critical value (critical value at alpha/2).

      2. The paired t test is used when the two groups are related – e.g. (1) data on Monday vs. data for Tuesday for the same subjects or (2) data from two twins or (3) data for the right eye vs data for the left eye for the same people

      Here you have business majors vs non-business majors and so you should consider using the two independent sample t test.


  43. Badr says:

    Dear Charles,
    I am analyzing the difference between the financial performance of two kinds of mutual funds over a period of time (vertically for the same kind) pre and post financial crisis and (horizontally for the two kinds). I am analyzing the population of both so, the two types have different number of observations (n). Which basically means that I should use the independent T-test not the matched pair one, based on the fact that the two types have different number of observation along the timeline. Using SPSS and Excel didn’t really help because I have no cut point or grouping variable.
    Any help?.

    • Charles says:

      I don-t completely understand your scenario. How many mutual funds do you use in your study? For each what sort of observations are you making (closing price, etc.)? For each mutual fund are you looking at the observations over a multiple time periods. both pre and post financial crisis?
      Not sure what sort of cut point or grouping variable you are referring to.

  44. CB001 says:

    Im comparing Qty of Transactions from different age-stage segments (Young, Adults, Mature, etc). Do you think the T-Test Two Paired Sample appropriate?

    • Charles says:

      You can run paired t-tests on pairs of samples (provided you take care of inflated error values), but with more than two samples, you need to use repeated measures ANOVA.

  45. Luke Warner says:


    I am doing a project comparing pre and post test scores of outcome measures for a group therapy. I am limited to the use of Microsoft Excel to undertake statistical analysis. I am sure that I am meant to do a t-test, but I am unsure which one.

    My sample is composed of 29 clients who completed the pre and post measures. These were chosen as they completed both measures, as opposed to those that didn’t. So I end up with two columns of 29 scores to compare (29 pairs).

    I have run both a ‘t-Test: Two-Sample Assuming Unequal Variances’ and a ‘t-Test: Paired Two Sample for Means’ using the ‘Data Analysis’ function of Excel. The results end up different for both, so I am unsure which one is correct.

    After reading this guide, I feel that the ‘t-Test: Paired Two Sample for Means’ is the most appropriate but I don’t trust my understanding enough to decide which is the one to report.

    I would be incredibly appreciative if you could clear this up for me, as no one else has yet been able to.

    Thank you.

  46. Jon says:

    Hello Charles. I’m doing comparing water quality before 2010 and after 2010. I’m using t-test from excel to test it. I also used SPSS to run Mann-whitney U-test on the same data. From SPSS, one of the results showed : “Exact Sig. [2*(1-tailed Sig.)]”. From my understanding, I think SPSS determine it as a one-tailed and thus i need to divide the p-value by 2 in order to get the right sig.value. But how do i know that the results from t-test in excel is in one tailed or two-tailed? If mann-whitney analysis it as one-tailed, should i take the one-tailed p-value for t-test too (say I run on the same set of data).

    I’m still abit blur on is it one-tailed or two-tailed. My main purpose is to reject my null hypothesis and say that there IS a difference in the water quality after 2010. this is my scenario.

    • Charles says:

      You need to determine in advance of running the test whether you want to use a one-tailed or two-tailed test. This is true for SPSS, Excel or any other test environment. Generally, you should choose the two-tailed test. The one tailed test is chosen if, based on some theoretical reason, only one side of the test is possible — see Null and Alternative Hypothesis for more details.

      • Jon says:

        1. Thanks for the link. I think I’m doing two-tailed since I’m determining
        H0: data set before 2010 = data set after 2010
        H1: data set before 2010 ≠ data set after 2010
        is my approach correct?

        2. For the SPSS results, “Exact Sig. [2*(1-tailed Sig.)]” mean my data can only actually test for one-tailed?

  47. Dan says:

    I got Mean and standard deviation of a group of subjects before and after treatment respectively, but no individual data, how can I calculate the mean and SD of change?
    Thank you

  48. David says:

    Hi, I’ve conducted a test of 4 different exercises using the same people, and want to calculate the p-vale between exercises to see if the results are significantly different. To put this into context, each exercise measured muscle activation from the gluteus medius and adductor magnus in working and opposite legs. Basically I would like to know what type of t-test to perform to find out if the gluteus medius results from one exercise are significantly different to the gluteus medius results of another exercise, and the same with adductor magnus. All of the data is on Excel.

    Your help would be greatly appreciated.

    • Charles says:

      Hi David,
      If you are using the t test, you should choose the paired sample t test.
      You might want to consider using ANOVA, in which case you should look at the repeated measures ANOVA.

  49. Jonathan Bechtel says:

    Hi Charles,

    When you compare the 2-Sample test w/ the paired sample test, you state the P-Value for a 2 tailed, 2 sampled test is 0.134576, but I believe that’s the value for a 1 tailed test, and the 2-tailed test is twice that amount.

    • Charles says:

      Thanks for catching this mistake. The correct values are shown in the Examples Workbook, but not on the webpage. I have now corrected the error. I appreciate your help in improving the website.

  50. KELVIN MUTITU says:

    the articles are wow and excellent. I have a question though
    If you have 5 groups in your study, why shouldn’t you just compute a t test of each group mean with each other group mean?
    I’d be glaf if you answer please.Thanks in advance

  51. Tiff says:

    I am in a statistics class in college right now. Can someone help me with these problems?

  52. Anne says:

    Hi Charles,

    This was a lot of help! Just wondering, if i was analysing three groups- a control, a caffeine group, and an alcohol group, would I still use the paired sample test?

    Kindest regards,

    • Anne says:

      Also, just a little bit more information- I’m testing which group causes more impairment when driving.


      • Charles says:

        After performing ANOVA you have two possibilities: (1) there is no significant difference between the three groups, in which the analysis ends and conclude that there is no group that causes significantly more impairment than the others or (2) there is significant different difference between the groups and so you can perform a follow-up test (contrasts, Tukey HSD, etc.) to locate which group causes more impairment.

    • Charles says:

      In general, if you have three different groups, then you would use ANOVA and not a t test.

  53. Christine says:

    This website is so informative and useful.
    I have one participant that I am trying to manipulate behaviour. I have collected 10 sets of data on the same participant (before and after treatment). Basically, I am using a single-case design whereby I measured the participant’s behaviour on 10 different occassions at baseline before treatment. I then measured the participant’s behaviour on 10 different occassions (after treament). Do I use a paired sample t-test for this. Measurements are in time (minute) intervals. Thanks for your help.

    • Charles says:

      You can’t use the paired t test because the 10 sets of data are not independent (since they are for the same subject).

      • frederick says:

        I have just carried out a survey on consumer satisfaction for 7 companies, and I only retain the responses where the respondent has made a purchases from each of the 7 companies. For each case, the respondent is asked to rate (1-10) on 14 items representing consumer satisfaction and 4 items representing consumer convenience. So I will have an individual rating for each item as well as an aggregate score for satisfaction and convenience. Now, if I want to perform a statistical test on whether the individual/aggregate rating for satisfaction and convenience are statistically different from each of the 7 cases, could you advise how I can do so?

        • Charles says:

          If you are using the aggregate score, then you can perform a one-way ANOVA with 7 factors (representing the companies). If there is a significant difference, you can pinpoint where the differences are by using a follow up test (e.g. Tukey’s HSD).

          If you have multiple scores per person, then you probably want to consider using MANOVA.


  54. Georgios says:

    Greetings, Charles

    As always, your website always comes up in various internet searches, and there is always useful infomation that cannot be (easily) found elsewhere.

    And of course, your generosity does not end there, given that you always provide help to those who ask.

    Here is a question, if I may.

    I’d like to run at paired t-test for a set of PRE POST variable , but would like a ONE tailed (positive difference for POST minus PRE) test and CONFIDENCE intervals.

    I notice that the output you provide has NO confidence intervals for the one tailed test.
    I suppose they should be given as either “lower bound” or”upper bound” only, the latter pertaining to my case.

    Here are some questions, if I may, please:
    1) Can can one “compare” such outcomes. For a two-tailed test, if both confidence interval boundaries are on either side of ZERO (not crossing it), the diagnosis of “difference” is easy to make… How does it work for ONE-tailed tests?

    2) Can one simply change the “alpha” and obtain a two-sided confidence interval,
    of which ONE of its boudaries
    will have the SAME
    as the outer boundery of the one-sided test confidence interval?

    Once again, thank you very much for your generous help.

    • Charles says:


      1. If mu represents the population mean, x-bar is the sample mean, se is the sample standard error and crit = T.INV(1-alpha,df), then when H0: mu < = 0 and H1: mu > 0, then the confidence interval is (x-bar – se*crit, +infinity). If H0: mu >= 0 and H1: mu < 0, then the confidence interval is (-infinity, x-bar + se*crit). If 0 is in this interval, then the null hypothesis is accepted. 2. No, because when calculating crit for the two-tailed test, crit = T.INV.2T(alpha, df) = T.INV(1-alpha/2,df) Charles

      • Georgios says:

        Many thanks, Charles. You provided 2 key point I could not find in books: ” If 0 is in this interval, then the null hypothesis is accepted”! And that for positive one-tailes, the interval used is that with the lower value to infinity that is used. It now became so clear!
        I’m working on SPSS, but decided to work with your addins, especially for such issues as one-tailed analyses, etc…

        For the paired t-test, you provide effect sizes, Pearson’s “r” and Cohen’s “d”. I’d like to make a diagram such as the following
        in my dissertation that will show a score of d = 1.7

        I saw such diagrams in another t-test page of yours.

        Can a set of two curves be drawn using “real statistics”?

        Once again, thanks for everything.


        • Charles says:

          You can draw these curves in standard Excel. For curve 1, you create a list of (x,y) coordinates that correspond to a t distribution with the desired degrees of freedom, just as was done for Figure 1 of t Distribution Basic Concept.
          For curve 2, you do the exact same thing except that you need to use the same x values but displace the curve (i.e the y values) by the amount that corresponds to d, using the definition of d (essentially you are shifting the mean from zero).

  55. Steven says:

    Hi Charles,
    I have a question-how to fiind the critical value for a paired-sample sign test? Is any function in excel available for that? I know there is a table from which we can find what we want, however that is some inconvenient.
    Would you please explain the question when available? Appreciate your help in advance.

    • Charles says:

      Just as for the paired t test, you first take differences between the pairs in the sample. Then you perform a single sample test, for the sign test, you use the binomial distribution, as described in the following webpage:

      • Steven says:

        Hi Charles,
        Thanks for your explanations. Just want to clarify some key points of sign test.
        1)When n≤25, we can:
        a. use BINOMDIST.test to calculate P and then compare it with P0, or
        b. find the critical value from the sign test table and then compare it with the test statistic
        to decide to reject or fail to reject H0.
        2) When n>25, we can find the critical value z0 from the standard normal table and then compare it with the test statistic z to decide to reject or fail to reject H0.
        And, a paired-sample sign test is similar to a single sample sign test as above.
        Correct me if I got anything wrong, please.
        Best withes

        • Charles says:

          What you said is basically correct, but I have the following observations:
          1. You can always use the binomial test, whether n is bigger or smaller than 25. For values of n sufficiently large (say n > 25) the normal distribution is a good approximation for the binomial distribution and so you can use it instead.
          2. The table of critical values is probably the table of critical values for the binomial distribution (essentially the values of CRITBINOM in Excel)
          3. I assume that by P0 you mean alpha.

  56. Warren says:

    Hi Charles,

    I have conducted a survey of 2 teachers and 24 students based on a set of maths questions. I asked students to rank 10 questions on a scale of 1-10 and for their teachers to also rank the questions from their students perspective. Would you recommend a two sample T-Test with Equal or Unequal Variance? Also, if I was to just analyse the data between students or teachers, what type of test would you recommend?

    Thank you!

    • Charles says:

      It seems like you have too many factors to use a t test. Before I can make any recommendation I need to understand your objective better. What exactly are you trying to test? It wouldn’t be surprising if any two people have different rankings for 10 questions. How can you determine whether the rankings are close or far apart? What is the relationship between the students and the teachers?

  57. Rachel says:

    The information on this site is amazing – thank you.
    Question: If I have 30 participants, tested each participant with a performance TASK 1 and performance TASK 2. The data collected, for each participant, is how many correct for each TASK. I am confused – what is the IV? what is the DV? Would it be correct if you completed the paired-sample t-test for this data? Thanks in advance for your help.

    • Charles says:

      Glad that you find the site useful.
      Assuming that you are trying to test whether there is a significant difference between the mean TASK 1 scores from the mean TASK 2 scores, you would usually use a paired t test (assuming the assumptions hold, esp. normality). The reason for choosing the paired test is that the same participants answered both tests. The IV is a binary categorical variable with values TEST 1 or TEST 2. The DV is the test score.

  58. benji says:

    Are we able to use the independent (unequal) test for paired samples? I get two very different results the same data against the unequal variances test and the paired sample test

  59. Lee says:


    Thank you for this informative tutorial.

    I wonder if you can help me with something. Using Excel 2010, I’m trying to create a box-plot of the difference scores for 13 clients’ pre- and post- scores on a measure in order to show a visual representation that the assumptions for the t-test hold, namely that the difference scores are normally distributed or at least reasonably symmetric as in Figure 3 above.

    Thank you in advance for any help with this.


    • Charles says:

      I show how to do this using the Real Statistics Boxplots tool on the wbepage
      That page also has a link to another webpage whichg shows you how to creat ethe box plot in Excel without using Real Statistics.

  60. Alex says:

    Hi, I have some question..I want to compare body characters measured between female and male of same species of insect..So, I am using independent t test…
    I also compare body characters measured between two different species of insect (same genus)..I am using independent t test too…but someone correct me to use pair t test…Is that possible to use pair t test? I need some help..Tqvm for answering me…

    From Alex

    • Charles says:

      If I was comparing husbands with their wives I could use a paired t test, but if I compare males in general with females in general, then I would use an independent t test. If you are comparing say 100 pairs of insects from 100 different species where each pair consists of a male and a female of the same species, then perhaps a paired t test could be justified.

  61. Takwa says:

    In the case of the occurrence of an outlier in the difference score for a paired t-test, how to control for its effect if any ?
    Thank you

    • Charles says:

      If the outlier is a real data element and not a typing mistake or error in measurement, then you should not delete it. In this case, provided the assumptions for the paired t test are met, I would simply use the paired t test. If the assumptions are not met, then I would use the Wilcoxon Signed-Ranks test.

  62. Rimaa Chand says:

    i am comparing difference between calories per slice for type 2 and type 3 pizza. i think its independent. can you help me to correct if im doing it right.

    • Charles says:

      I can’t tell whether you are doing it right, until you tell me what you are doing.

      • Ojuolape azeem olamilekan says:

        please I need explanation on how to analyse my data using z test or excel buh I don’t have the SPSS software this is the question
        H0:That ICT has no impact on accounting practices
        H1: That ICT has impact on accounting practices
        50 questionnaires were distributed and 38 were returned in which 20 chose YES and 16 chose NO and 2 chose Not sure

  63. Daniel says:

    please the difference between the two standard deviations is not equal to 6.32982,that is 28.56188-24.39262 gives please am a bit confused as to how that figure was arrived.

    • Charles says:

      You don’t want to use the difference between the two standard deviations. Instead you want to use the standard deviation of the differences, namely STDEV.S(D4:D18).

  64. pn says:

    Hi Charles,
    If I divide a sample (of say 100 observations) of an index and divide the sample into two samples and calculate the mean for both the subsamples than can I still use the Paired two sample test to compare the means?

  65. Lindsay says:

    Hi Charles,

    I have a very basic question
    I’m doing an unpaired t-test for meta-analysis and comparing the % sensitivity in two treatments.

    The two means and standard deviations ars compiled from 9 and 12 experiments respectively, and each experiment has from 50-100 subjects.

    So what I can’t figure out is, what’s my sample size? Is it the number of experiments I’m running (9 and 12) or the total number of subjects I have (about 1000) in each group?

    Thank you in advance.

    • Charles says:

      If you are comparing the means of the 1,000 or so subjects in each group, then the sample size is about 1,000. If instead you are comparing the means of the means of the 9 to 12 experiments, then it is the 9 to 12 value. In either case you need to make sure that the assumptions of the test are met, especially regarding independence.

  66. Lukas says:

    Hi Charles,

    Thanks for your great resources here!

    I have a question for a data analysis task. I conducted a survey with the following parameters:
    >15 Questions
    >Each question has the same Likert Scale (1[Strongly Disagree]-5 [Strongly Agree])
    >10 respondents
    >I ran the same survey prior and after the implementation of a new project model.

    How can I apply the paired t-test to evaluate whether a significant improvement due to the new project model has occurred? Am I right if I compare the average values per question in the paired t-test?

    e.g. Q1: PRE=3.9 POST=4.3
    Q2: PRE=3.5 POST=4.1

    The questions would be the measurement variables and the averages the two nominal variables calculated as the average value per question from the 10 respondents.

    Or would that be methodologically incorrect?

    Thanks a lot!

    • Charles says:

      Glad you like the resources.
      Before I can answer your question, I need to understand what hypothesis you want to test. THis needs to be a clear and concise statement.

      • Lukas says:

        Thanks Charles,

        Generally, I want to test, if the implementation of a new process has led to improvements in the team. To assess this, I constructed a questionnaire that covers different factors (e.g. collaboration, amount of work, leveraging individual skills).

        So as for my understanding of the paired t test for evaluating survey results, each question would be 1 hypothesis that I test?

        • Charles says:

          Possibly, but I would need more information to answer definitely. It might be better to start with Hotelling-s T-square or something else.

  67. dwight says:

    hi sir ,how to use figure 4 – excel data analysis of paired sample in excel ?

  68. Ramatu says:

    Please, I am trying to run a t-test to find out the effect on a policy on maternal mortality for the years before the policy and after the policy was implemented. I am not sure which test will give me the needed information.

  69. misty says:

    Hello there.

    I am in a graduate stats course and having some trouble. here is my scenario:

    Scenario: Dr. White is a geriatric nursing practitioner. He would like to know whether a new treatment compared with the regular treatment makes difference on patients’ memory. After his research proposal was approved by IRB at his institution, he randomly recruited patients from his clinics for the study. Patients who agreed to participate in his study were randomly assigned to either the treatment group or the control group. Patients who were in the treatment group received the new treatment, and patients who were in the control group received regular treatment. Both groups of patients’ memories were tested after four weeks of treatments.

    I’m having a hard time deciding if I should use an independent t-test or a paired t-test. These appears to be two independent groups to me however the re-measure of memory makes me want to used a paired t-test. I am using spss software just confused on which test best fits. any suggestions would help. thank you.

    • Charles says:

      If you are just comparing patients in the control group with those in the treatment group after 4 weeks, then an independent t test is a fit. If you want to compare the before treatment with treatment after 4 weeks, then the paired t test is correct. If you want to combine both of these tests, then you need a two factor repeated measures Anova. All of these tests are described on the Real Statistics website.

  70. Alex G says:


    I have conducted a pre-questionaire (likert scale) and a post questionnaire (likert scale) comparing confidence levels of a feedback session, both take 4 times over 4 weeks. What analysis do i use on SPSS? sample were the same 14 Males used each time. Many thanks.

    • Charles says:

      If you have 14 subjects with a score at time 0 and a score at time 1, then a paired t test could work. If the normality assumption is violated (which could be relevant with likert data), then you could use a Wicoxon signed-ranks test.
      If you have more than 2 times, then you can use Repeated Measures Anova or possibly Friedman’s test if the assumptions are violated.
      If your situation is different, then I would need more information to provide an answer.

  71. Bob Brown says:

    I am testing performance of a measurement device, then making a change to the device and repeating the tests on the same device.
    The goal is to improve the performance of the device as indicated by increased precision (lower standard deviation of the results on a series of tests after the change).
    Each test consists of 6 measurements; each test is repeated 3 times before and after the change.
    This results in 3 standard deviation values from the tests before the change and 3 standard deviation values from the tests after the change.

    My question is, which t-test should I use to evaluate the results:
    Paired sample t-Test since it is the same measurement device?
    Two Sample t-Test equal variances if the variances of the two data sets is similar?
    Two Sample t-Test unequal variances if the variances of the two data sets is not similar?



    • Charles says:

      What precisely do you mean by “evaluate the results”? In general, the specific test that you use depends on what you are trying to test.

      • Bob Brown says:


        I want to know if the change in performance is statistically significant based on the t-Test. Which t-Test is the correct one to use? By “evaluate the results” I mean calculate the p-value.


        • Charles says:

          The website explains this, but in any case you would need to provide additional information before I could determine which test to use.

  72. Rinna says:

    I am currently working on a questionnaire and I need to measure its reliability.
    I’d did a test-retest but I am somehow confused on how to compute for its reliability. I also have some previous undergrad thesis and I saw that they’d used t-test and thrn wilcoxon. I hope you can answer my inquiry. If you need any addntl info I am willing to give it to you. Thank you 🙂

  73. Alexia says:

    Hello, Charles!
    First I would like to thank you for all these great information!
    I have one question. I have recently had an assignment and I would like to know if I was right. The assignment was to find out whether there was a significance increase from 2014 to 2015 in the number of visitors in Museum X. As for the data, I had the number of visitors for each month in 2015 and for each month in 2016. I have conducted a t.test for independent samples and created a confidence interval, but now I am starting to think that I was wrong and they were in fact dependent samples.
    I would like to ask for your opinion. Do you think that it’s dependent or independent?
    And if it is dependent sample, can it also be correct the t.test for independent sample? I read somewhere this “Since the pairing is explicitly defined and thus new information added to the data, paired data can always be analyzed with the independent sample t-test as well, but not vice versa.” and was wondering if it is true.
    Thank you!

  74. Vijay Rathod says:

    Dear Sir,
    Thanks for your wonderful website !!
    I would like a word of advice from you.

    Prawns are known to lose weight after fished out of water.
    I want to check claim of a firm that its product (weight increasing additive) increases weight of prawn by better water retention.
    Following is plan of experiment. Please advice whether it is correct.
    All prawns for experiment will be harvested form same pond. All will be in weight range of 28 g to 30 g per individual.
    Ho : μ (without treatment) = μ (with treatment)
    H1 : μ (without treatment) ≠ μ (with treatment)
    α : 0.05
    Without Treatment : In this group, weight increasing additive is NOT added. 1kg prawns and 1liter chilled water is kept in each of 15 tubs. Let us say, this is row A.
    With Treatment : In this group, weight increasing additive IS added. 1kg prawns, 1 liter chilled water and weight increasing additive is added (as per dose prescribed by manufacturer) in each of 15 tubs. Let us say this is row B.
    Both row A and row B are arranged simultaneously and kept undisturbed for 150 minutes in the same room at same temperature.
    After 150 minutes, water is drained and weight gain (in percentage) is noted in prawns in each tub.
    For this arrangement, can we consider A1 and B1 as one pair of tubs, A2 and B2 as second pair and so on upto A15 and B15. i.e can we say we have arranged 15 pairs of tubs with each pair having one tub without treatment and other with treatment, all other conditions being same.
    Thus, if we apply t test for dependent samples, would it be correct for aim of our experiment ? Or should I apply t test for independent samples ?

    Thank you !
    Vijay Rathod

  75. Rhian says:


    Please would you be able to help me understand my undergraduate dissertation statistics? I am not sure which test I should be doing but have been told to do paired T test. My dissertation is using a blood sample but testing it for two microRNA markers. So blood sample 1 is tested for both marker 1 and marker 2.

    Would I be doing paired T test for this? As its not a before and after its two separate markers from the same blood sample

    Hope that was understandable.

    Kind regards.

    • Charles says:

      Yes, paired t test is applicable in this case even though you don’t have a before and after situation. The key is that is each subject is tested for both markers.

  76. Cecil Tse says:

    Hi Charles,

    I need your help over below case.

    H0: u1=u2 and H1: u1 ≠ u2

    And, we have below info.

    df = 24, t Stat = -1.5917, P(T<=t) two tail = 0.1245, t Critical two tail = 2.79694, at 10% signficant level.

    Should we accept H0?! Because t Stat (-1.5917) is within t critical two tail (2.79694,-2.79694).

    Hope to seek your advice. I do not have any example with negative value over t Stat.

    • Charles says:

      1. For a 2 tailed test, due to symmetry of the t distribution, you can use the absolute value of the t statistic. Thus the p-value = T.DIST.2T(ABS(-1.5917),24) = 0.1245 > .10 = alpha, and so you can’t reject the null hypothesis. Note that this doesn’t mean that you “accept” the null hypothesis, just that you are 90% confident in it (which is not the same thing as 100% confident).
      2. Note that the two-tailed critical value = T.INV.2T(.1,24) = 1.710, which is not the same as the value you calculated.

      • Cecil Tse says:

        Sorry Charles, I am really a dummy over statistics. And need more clarification from you. Hope you can help me more. Appreciate first.

        The case said that “the scores in a mths capability test of 14 randomly chosen male and 14 randomly chosen female. The scores of male and female are normal. Some ppl suspect that the average scores of male and female are different. At 10% significance level, do the data provide sufficient evidence to support that argument?”
        And the related info shows:
        df = 24, t Stat = -1.5917, P(T aplha = 0.05?

        • Cecil Tse says:

          i guess my above comments is trimmed. I guess p-value = .1245 > .05 (=0.1/2)= alph? If p-value<alpha, accept H1? when we accept H0? How many possilbe outcome after we compare p-value against alpha?

          next, my understanding over this case should follow normal distribution. t critical (left) = -2.7969 < t stat < t critical (right) = 2.7969 is saftety range for H0. Correct?

          • Charles says:

            The p-value of .1245 is for the two-tailed test, and so you need to compare .1245 WITH .1, not .05. Since .1245 > .10, you don’t have evidence to reject the null hypothesis. You probably want to call this “accepting” the null hypothesis, but this is technically not correct since there is still some doubt about the result. This doesn’t mean that you should accept H1 — far from it. You are 90% confident in the null hypothesis.
            There are two outcomes: (1) reject the null hypothesis (i.e. you have evidence for H1) or (2) don’t reject the null hypothesis (i.e. you evidence in favor of the null hypothesis.
            As I said in my earlier response, I don’t believe that you have calculated the critical value correctly. If -crit < t < crit, then you can't reject H0. Charles

        • Charles says:

          Based on the results you showed me in your earlier comment, the data does not provide sufficient evidence to support the argument that the average scores of the males and females are different.
          Note that this is a two independent samples t test (not a paired t test).

          • Cecil Tse says:

            The sample t test I provided before is not paired because df is not equal to 13. Right?
            How the data can provide sufficient evidence to support the argument that the average scores of the males and females are different.
            Paired test usually repeats the same sample over before/after test. Here, we have two samples – male/female score? how to link up a pair of male/female score?

          • Charles says:

            You comment is to the paired t test even though you are not performing a paired t test.
            You would use a paired t test if the male is the husband of each female.

  77. Berlian says:

    Dear Charles,
    I have question regarding paired t-test. If the patients and their spouse will be compared for quality of life, anxiety, and depressiob, can it use paired t-test? If it can, why? Can you give me explanation and citation related my question? Because I still new in statistical course, thank you

    • Charles says:

      Yes, you should use a paired t test to compare any trait between patients and their spouse.
      But the t test can only be used to compare one trait. If you have multiple traits, then you could use Hotelling’s T-square test.
      The paired t test is used since patients and their spouse are not independent of each other. This is cited in numerous books and websites.

  78. Reef says:

    Thank you very much for the information. if you don’t mind, I have a question. I am using an inter-rater method to check the reliability (the data of one participant of my sample was analyzed by me, then, the same one was analyzed by a colleague. The table, in the findings, includes five categories and the number of occurrences- with the percentage). The thing is that I don’t know which test to use! I’ve read three books on research and didn’t find what I am looking for. Please help me.

  79. Bruno Sfogliarini says:

    My congratulations for your website and your professionality.

    I have one question for you: I’d like to test if a change of exposure of a product (say, an innovative shelf) over a test sample of stores Vs. a matched control sample will be significative. By “matched” I mean that each store in the test sample has a “twin” store in the control sample, according to a set of measures (e.g. volume sales of the product and their change over time). I’d need to estimate the size of the matched samples (equal sizes). I’ve available historical data for all the stores of a given chain, hence I can compute mean and standard deviation of product sales and also of their trend (i.e. percentage change between last year and previous year). So…is this a paired t-test and if yes should I somehow factor in the sample size estimation the correlation between the two matched samples?

    Many, many thank for your help!!!

    • Charles says:

      Provided you have accounted for all the relevant variability factors in selecting your matched pairs, you should be able to use a paired t test.
      The sample size will be based on the differences in whatever you are measuring (presumably product sales) and so you don’t need to look at the correlation between the matched samples.

  80. vivekanand Chandrakumar says:

    Hello Charles,
    Firstly thank you for the wonderful article. I have a small doubt for you. I’m conducting a study where I’m comparing percentage change of a variable between the same group after an intervention.I have percentage A (P1) prior to the intervention and P2 after the intervention. I can calculate percentage change but How do i know if the percentage change calculated is significant or not?
    Please do let me know at the earliest.

    • Charles says:

      You can use the paired t test for this sort of situation (provided the assumptions are satisfied). See referenced webpage for more details.

  81. Y Davidson says:

    Thanks Charles for the article!

    I want to check if the amount of fluid a person needs predicted by a formula and the amount a person has actually had are similar. What test should I use to see if the formula works?

    • Charles says:

      I am not able to give a definitive answer without more information. Possible approaches are the paired t test, Wilocoxon signed ranks test or even goodness of fit approaches (Anderson-Darling test).

  82. Momo says:

    Hi, I was wondering how you would do a power calculation on an unpaired t-test assuming unequal variances?

  83. annelienpeeters says:


    I have a question about the use of the paired t-test.
    I’m conducting a research about the efforts for sustainability a certain company.
    I used a survey with 80 respondents (employees of that company).
    I used a scale from 1 (not important) to 5 (very important).
    I want to see if there’s a difference between the perceived importance and the desired importance.
    Perceived importance: which level of importance does the company currently award to each item.
    Desired importance: which level of importance do YOU wish the company would award to each item.
    Examples of items:
    – Natural sources protection
    – Investment into environmental technologies
    – ….
    (there are 21 items).

    I would like to see if there is a ‘gap’ between the perceived and the desired importance. Can I use the paired t-test to see if there’s a significant difference?

    Thank you for your article

    • Charles says:

      If each respondent gives one rating for all 21 items (together), then the paired t test seems appropriate (provided the assumptions for the test are met). If each of the 21 items is rated separately, then the multivariate version of the paired t test is used, namely the paired Hotelling’s T-square test, is used. See the following webpage
      Paired Hotelling’s T-square Test

  84. Haim says:

    Hello Charles;

    Thank you for making stats more understandable
    I have a need to check the following:

    Our engineering team desinged a new test stand for a specific test we have (leak test) which suppose to be better. They have asked us (Quality) to run the test for each tested unit in both tests and record data

    we have collected 230 units data and now they asked us to tell if we think the new test is better

    Any suggestions on how to approach?

  85. Naser says:

    Dear Sir,

    I use TOPSIS method for ranking a set of patients, but I got two different ranking one form internal and theother from external aggregation.
    How to use T test in this case? and what is the suitable formula in Excel?


    • Charles says:

      Sorry, but I am not familiar with the TOPSIS method, except for what I just read on Wikipedia, and so I can’t comment on why you got two different rankings.
      Regarding the t test, what hypothesis are trying to test? Excel provides the T.TEST (or TTEST) formula which might be useful. This is described on the Real Statistics website.

  86. Stavros says:

    I am working on my Master Thesis and I have a question about this test if you can help me. With the example that you use on this article let’s say that I have a group of 20 people and have them try 3 different drinks (Beer, Wine, Wiskey). I used Bonferroni Post Hoc test to find if the differences between the 3 values are significant. I could do the same and compare every value with each other with Paired T Test. This test gives me much different p values than bonferroni. Paired T-test in this case whould be valid?

    • Charles says:

      When you say that you performed a Bonferroni post-hoc test, did you use the post hoc test for ANOVA with repeated measures? The Bonferroni test after ordinary one-way ANOVA will give a different result, and this result is not correct. The default alpha value for a Bonferroni post-hoc test with three variables is alpha/2. If you perform 3 tests then you need to change this to alpha/3; if you only perform 1 test, then you can use alpha.

  87. As a project for class, I am conducting a t-test on data retrieved from a survey. My hypothesis is ” Does all of the information and support latent with E-commerce outweigh the ability to tangibly sense the product you are buying in terms of customer satisfaction”. I used a t-test assuming equal variances in excel with a hypothesized mean difference of 0. The test itself was easy to administer, deciphering the output is another story. Namely the following:
    df 14
    t Stat 1.534834562
    P(T<=t) one-tail 0.073555106
    t Critical one-tail 1.761310136
    P(T<=t) two-tail 0.147110212
    t Critical two-tail 2.144786688
    It seems to me that my educated guess was proven- that E-commerce is superior.
    My question is does the data prove that?

    • Charles says:

      For the two-tailed test, the results show a p-value = 0.147110212. Since this value is larger than alpha = .05 (the value generally employed), this result supports the null hypothesis, and so there isn’t a significant difference between the two populations. This is different from what your intuition is suggesting.
      I see that you have conducted a two sample t test, although you are referencing a paired t test. It is important to choose the correct test.

      • I believe I understand, I referenced this test:

        t-Test: Paired Two Sample for Means

        E-comm Physical
        Mean 7.961904762 5.321428571
        Variance 13.20571429 15.38988095
        Observations 7 7
        Pearson Correlation -0.522168373
        Hypothesized Mean Difference 2
        df 6
        t Stat 0.256973779
        P(T<=t) one-tail 0.402892083
        t Critical one-tail 1.943180281
        P(T<=t) two-tail 0.805784166
        t Critical two-tail 2.446911851
        This is quite a bit different from the one I posted, I'm glad you caught that. This is the first one I used, the one approved was a paired sample. I was under the impression that because my hypothesis did not assume that E-commerce was greater or less than that it would be a one-tailed test (as I was only testing one direction- only greater). Am I wrong on this?
        This is the test I am to use in my paper, "paired sample" was approved and the three available under data analysis in excel threw me a curve.

Leave a Reply

Your email address will not be published. Required fields are marked *