Correlation in Relationship to t-test

Two sample comparison of means testing such as that in Example 2 of Two Sample t Test with Equal Variances can be turned into a correlation problem by combining the two samples into one (random valuable x) and setting the random variable y (the dichotomous variable) to 0 for elements in one sample and to 1 for elements in the other sample. It turns out that the two-sample analysis using the t-test is equivalent to the analysis of the correlation coefficient using the t-test.

Example 1: Calculate the correlation coefficient r for x and y as above using the data in Example 2 of Two Sample t Test with Equal Variances, and then test the null hypothesis H0: ρ = 0.

Correlation t-test

Figure 1 – Using correlation testing to solve Example 1

The values for p-value and t are exactly the same as those that result from the t-test in Example 2 of Two Sample t Test with Equal Variances. Again we conclude that the hay fever drug did not offer any significant improvement in driving results as compared to the control.

Definition 1: A variable is dichotomous if it only takes two values (usually set to 0 and 1).

The point-biserial correlation coefficient is simply the Pearson’s product-moment correlation coefficient where one or both of the variables are dichotomous.

Property 1:
image1545

where t is the test statistic for two means hypothesis testing of variables x1 and x2 with t ~ T(df), x is a combination of x1 and x2 and y is the dichotomous variable as in Example 1.

Observation: The value for t from Example 2 of Two Sample t Test with Equal Variances is .1004. By Property 1,

image5037

and so r = .0214, which agrees with the value we get using CORREL (as we can see in cell AB3 in Figure 1).

Observation: The effect size for the comparison of two means (see Two Sample t Test with Equal Variances) is given by

Cohen's d population

The sample version of this measure of effect size is

Cohen's d sample

Using the formula from Theorem 1 of Correlation Testing via the t Test, we can covert this to an expression based on r, namely:

image1651

E.g., for the data in Example 1:

image1652

This means that the difference between the average memory recall score between the control group and the sleep deprived group is only about 4.1% of a standard deviation. Note that this is the same effect size that was calculated in Example 2 of Two Sample t Test with Equal Variances.

Alternatively, we can use φ (phi) as a measure of effect size. Phi is nothing more than r. For this example φ = r = 0.0214. Since r2 = 0.00046, we know that 0.46% of the variation in the memory recall scores is based on the amount of sleep.

A rough estimate of effect size is that r = .5 represents a large effect size (explains 25% of the variance), = .3 represents a medium effect size (explains 9% of the variance), and= .1 represents a small effect size (explains 1% of the variance).

Property 3: If {y1, …, yn} is a sample for the dichotomous random variable y and {x1, …, xn} is a sample for the random variable x, the point-biserial correlation coefficient between these samples is given by the formula

image203z

where m0 is the mean of the n0 data elements xi whose corresponding y value is yi = 0, m1 is the mean of the n1 data elements xi whose corresponding y value is yi = 1 and sx is the (sample) standard deviation of {x1, …, xn}.

If {x1, …, xn} and {y1, …, yn} are populations, then the point-biserial correlation coefficient is

image204z

where σx is the (population) standard deviation of {x1, …, xn}.

Observation: Based on Property 3, the correlation coefficient shown in cell AB3 of Figure 1 can be calculated as shown in Figure 2.

Point-biserial correlation

Figure 2 – Calculation of point serial correlation coefficient

22 Responses to Correlation in Relationship to t-test

  1. Imcafee says:

    I am doing a study that looks at the relationship between father participation in school activities (0, 1) and change in their children’s test scores (spring to spring). If I use a t-test to calculate the difference of the means of the change in scores by group and I find that the difference is significant, should I also talk about the slope of the regression line or the correlation coefficient to get a sense of the nature of the relationship or R squared? How would I get that number?

    • Charles says:

      Imcafee,
      I am not sure what extra information you would get by doing this, but as the referenced webpage explains, you can turn a t test into a correlation by using a dummy dichotomous variable. Once you calculate the correlation coefficient in this way, R-square is just the square of the correlation coefficient.
      Charles

  2. Marinela Salas says:

    r(x, y) t p
    -0.78 -3.29 0.01
    -0.28 -0.86 0.41
    -1.00
    pls help me in interpreting this result…correlation of two variables.thanks

    • Charles says:

      The first row calculates a sample correlation coefficient of -.78 and shows that the population correlation coefficient is significantly different from zero with 99% confidence
      The second row calculates a sample correlation coefficient of -.28 and cannot reject the null hypothesis that the population correlation coefficient is zero
      The third row calculates a sample correlation coefficient of -1, which means that the two samples are 100% negatively correlated
      Charles

  3. JOMS says:

    r (Correlation) t comp Tabled t Comparison Decision
    0.12 0.99 1.996 less than ?

    How to interpret this table? Kindly please help me. Thank you

  4. Dhruv Pandya says:

    I have a question.

    I have two variables out of which one is continuous and the other is (artificially) dichotomous with an underlying property being continuous and normally distributed.
    I want to find the correlation coefficient between these two variables. which will be better Point Biserial or Biserial coeff?

    • Charles says:

      Dhruv,
      As always the answer depends on what you want to do with the result, but based on your description it sounds like you should use the biserial coefficient.
      Charles

      • Dhruv Pandya says:

        Thank you Charles.

        My purpose is to study the correlation between the variables. Both variables have Physical meaning and a correlation will help me understand the physical meaning between the two.

        Could you tell me how to compute Biserial correlation with your tool (the excel add in) ?

        I have to specifically say that the tool has been very helpful to me for my work.

        Best,

        Dhruv

        • Charles says:

          Dhruv,
          Excel’s CORREL function can be used to compute the point biserial correlation coefficient. I plan to add the biserial correlation coefficient to the Real Statistics software in the next day or two. I will then update the website to explain how to calculate the biserial correlation coefficient manually (using Excel). Stay tuned.
          Charles

  5. cayyum says:

    May i ask a qn? Who is Charles Zaiontz?

  6. Charisa Candelario says:

    Can I ask for help? Here is the data given aside from there means.
    ΣX2 = Sum of square First Scores
    ΣY2 = Sum of square Second Scores.

  7. Charisa Candelario says:

    Can I ask for help? Suppose I have the data for means of x and y then their summation of squared value only, how can i compute if they are significantly different at 5% level of significance. Thank you.

    • Charles says:

      Charisa,
      When you say that you want to “compute if they are significantly different”, are you referring to the means of x and y or something related to the correlation between x and y (in which case, the usual test is whether the correlation is significantly different from zero)?
      Charles

  8. Pingback: Gibt es einen Zusammenhang zwischen Parteien/Koalitionen und Arbeitslosigkeit? | Der Burtchen

  9. Christian Burtchen says:

    Thank you so much for this helpful explanation and the worksheet. However, in the downloadable worksheet, instead of ‘sig’ the cell value will be called ‘reject’, which to me sounds like the exact opposite. Am I missing something?

    Kind regards and many thanks, Christian.

    • Charles says:

      Christian,
      “Reject” in this context means “reject the null hypothesis,” which is equivalent to a significant result. Also, it seems that you are referring to an old version of the examples worksheet. The latest version uses “sig” instead of “reject”.
      Charles

  10. Usman says:

    So, I can use any one I wish since they are the same

  11. Arturo says:

    Dr. Zaiontz,
    What is the formula for calculating correlation coefficient from paired continuous data i.e., crossover trial?

    Thanks very much in advance

    Best regards

    Arturo

Leave a Reply

Your email address will not be published. Required fields are marked *