Two sample comparison of means testing such as that in Example 2 of Two Sample t Test with Equal Variances can be turned into a correlation problem by combining the two samples into one (random valuable* x*) and setting the random variable y (the dichotomous variable) to 0 for elements in one sample and to 1 for elements in the other sample. It turns out that the two-sample analysis using the t-test is equivalent to the analysis of the correlation coefficient using the t-test.

**Example 1**: Calculate the correlation coefficient *r* for *x* and y as above using the data in Example 2 of Two Sample t Test with Equal Variances, and then test the null hypothesis H_{0}: *ρ* = 0.

**Figure 1 – Using correlation testing to solve Example 1**

The values for p-value and *t* are exactly the same as those that result from the t-test in Example 2 of Two Sample t Test with Equal Variances. Again we conclude that the hay fever drug did not offer any significant improvement in driving results as compared to the control.

**Definition 1**: A variable is **dichotomous** if it only takes two values (usually set to 0 and 1).

The **point-biserial correlation coefficient** is simply the Pearson’s product-moment correlation coefficient where one or both of the variables are dichotomous.

where *t* is the test statistic for two means hypothesis testing of variables *x*_{1} and *x*_{2} with *t* ~ *T*(*df*), *x* is a combination of *x*_{1} and *x*_{2} and y is the dichotomous variable as in Example 1.

**Observation**: The value for *t* from Example 2 of Two Sample t Test with Equal Variances is .1004. By Property 1,

and so *r* = .0214, which agrees with the value we get using CORREL (as we can see in cell AB3 in Figure 1).

**Observation**: The effect size for the comparison of two means (see Two Sample t Test with Equal Variances) is given by

The sample version of this measure of effect size is

Using the formula from Theorem 1 of Correlation Testing via the t Test, we can covert this to an expression based on *r*, namely:

E.g., for the data in Example 1:

This means that the difference between the average memory recall score between the control group and the sleep deprived group is only about 4.1% of a standard deviation. Note that this is the same effect size that was calculated in Example 2 of Two Sample t Test with Equal Variances.

Alternatively, we can use *φ* (**phi**) as a measure of effect size. Phi is nothing more than *r*. For this example *φ* = *r* = 0.0214. Since* r ^{2} *= 0.00046, we know that 0.46% of the variation in the memory recall scores is based on the amount of sleep.

A rough estimate of effect size is that *r* = .5 represents a large effect size (explains 25% of the variance), *r *= .3 represents a medium effect size (explains 9% of the variance), and* r *= .1 represents a small effect size (explains 1% of the variance).

**Property 3**: If {y_{1}, …, y_{n}} is a sample for the dichotomous random variable y and {*x*_{1}, …, *x*_{n}} is a sample for the random variable *x*, the point-biserial correlation coefficient between these samples is given by the formula

where *m*_{0} is the mean of the *n*_{0} data elements *x _{i}* whose corresponding y value is y

*= 0,*

_{i}*m*

_{1}is the mean of the

*n*

_{1}data elements

*x*whose corresponding y value is y

_{i}*= 1 and*

_{i}*s*is the (sample) standard deviation of {

_{x}*x*

_{1}, …,

*x*

_{n}}.

If {*x*_{1}, …, *x*_{n}} and {y_{1}, …, y_{n}} are populations, then the point-biserial correlation coefficient is

where σ* _{x}* is the (population) standard deviation of {

*x*

_{1}, …,

*x*

_{n}}.

**Observation**: Based on Property 3, the correlation coefficient shown in cell AB3 of Figure 1 can be calculated as shown in Figure 2.

**Figure 2 – Calculation of point serial correlation coefficient**

I am doing a study that looks at the relationship between father participation in school activities (0, 1) and change in their children’s test scores (spring to spring). If I use a t-test to calculate the difference of the means of the change in scores by group and I find that the difference is significant, should I also talk about the slope of the regression line or the correlation coefficient to get a sense of the nature of the relationship or R squared? How would I get that number?

Imcafee,

I am not sure what extra information you would get by doing this, but as the referenced webpage explains, you can turn a t test into a correlation by using a dummy dichotomous variable. Once you calculate the correlation coefficient in this way, R-square is just the square of the correlation coefficient.

Charles

r(x, y) t p

-0.78 -3.29 0.01

-0.28 -0.86 0.41

-1.00

pls help me in interpreting this result…correlation of two variables.thanks

The first row calculates a sample correlation coefficient of -.78 and shows that the population correlation coefficient is significantly different from zero with 99% confidence

The second row calculates a sample correlation coefficient of -.28 and cannot reject the null hypothesis that the population correlation coefficient is zero

The third row calculates a sample correlation coefficient of -1, which means that the two samples are 100% negatively correlated

Charles

r (Correlation) t comp Tabled t Comparison Decision

0.12 0.99 1.996 less than ?

How to interpret this table? Kindly please help me. Thank you

Sorry, but I don’t know what you are referring to.

Charles

I have a question.

I have two variables out of which one is continuous and the other is (artificially) dichotomous with an underlying property being continuous and normally distributed.

I want to find the correlation coefficient between these two variables. which will be better Point Biserial or Biserial coeff?

Dhruv,

As always the answer depends on what you want to do with the result, but based on your description it sounds like you should use the biserial coefficient.

Charles

Thank you Charles.

My purpose is to study the correlation between the variables. Both variables have Physical meaning and a correlation will help me understand the physical meaning between the two.

Could you tell me how to compute Biserial correlation with your tool (the excel add in) ?

I have to specifically say that the tool has been very helpful to me for my work.

Best,

Dhruv

Dhruv,

Excel’s CORREL function can be used to compute the point biserial correlation coefficient. I plan to add the biserial correlation coefficient to the Real Statistics software in the next day or two. I will then update the website to explain how to calculate the biserial correlation coefficient manually (using Excel). Stay tuned.

Charles

May i ask a qn? Who is Charles Zaiontz?

See Author.

Charles

Can I ask for help? Here is the data given aside from there means.

ΣX2 = Sum of square First Scores

ΣY2 = Sum of square Second Scores.

Can I ask for help? Suppose I have the data for means of x and y then their summation of squared value only, how can i compute if they are significantly different at 5% level of significance. Thank you.

Charisa,

When you say that you want to “compute if they are significantly different”, are you referring to the means of x and y or something related to the correlation between x and y (in which case, the usual test is whether the correlation is significantly different from zero)?

Charles

Pingback: Gibt es einen Zusammenhang zwischen Parteien/Koalitionen und Arbeitslosigkeit? | Der Burtchen

Thank you so much for this helpful explanation and the worksheet. However, in the downloadable worksheet, instead of ‘sig’ the cell value will be called ‘reject’, which to me sounds like the exact opposite. Am I missing something?

Kind regards and many thanks, Christian.

Christian,

“Reject” in this context means “reject the null hypothesis,” which is equivalent to a significant result. Also, it seems that you are referring to an old version of the examples worksheet. The latest version uses “sig” instead of “reject”.

Charles

So, I can use any one I wish since they are the same

Yes, you can use either one since they are equivalent tests.

Charles

Dr. Zaiontz,

What is the formula for calculating correlation coefficient from paired continuous data i.e., crossover trial?

Thanks very much in advance

Best regards

Arturo

Arturo,

The formula for the correlation coefficient is the square root of t^2/(t^2+df) where t is the t statistic and df is the degrees of freedom. E.g. for Example 1 in http://www.real-statistics.com/students-t-distribution/paired-sample-t-test/ t = 6.6897 and df = 14. Thus the correlation coefficient is .87276. You can use the Real Statistics T-Test data analysis tool to get this answer (see Figure 5 of http://www.real-statistics.com/students-t-distribution/paired-sample-t-test/).

Charles