# Biserial Correlation

In Relationship between Correlation and t Test and Relationship between Correlation and Chi-square Test we introduced the point-serial correlation coefficient, which is simply the Pearson’s correlation coefficient when one of the samples is dichotomous.

The biserial correlation coefficient is also a correlation coefficient where one of the samples is measured as dichotomous, but where that sample is really normally distributed. In such cases, the point-serial correlation generally under-reports the true value of the association. The biserial correlation coefficient provides a better estimate in this case.

Assuming that we have two sets X = {x1, …, xn} and Y = {y1, …, yn} where the xi are 0 or 1, then the biserial correlation coefficient, denoted rb, is calculated as follows:

Where n0 = number of elements in X which are 0, n1 = the number of elements in X which are 1 (and so n = n0+n1), p0 = n0/n, p1 = n1/n, m0 = the mean of {yi: xi = 0}, m1 = the mean of {yi: xi = 1}, s is the standard deviation of Y and

y = NORM.S.DIST(NORM.S.INV(p0),FALSE)

Example 1: Calculate the biserial correlation coefficient for the data in columns A and B of Figure 1.

Figure 1 – Biserial Correlation Coefficient

The biserial correlation of -.06821 (cell J15) is calculated as shown in column L. Note that the value is a little more negative than the point-serial correlation (cell C4).

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack.

BCORREL(R1, R2) = the biserial correlation coefficient corresponding to the data in column ranges R1 and R2, where R1 is assumed to contain only 0’s and 1’s.

For biserial correlation coefficient for Example 1 can be calculated using the BCORREL function, as shown in cell G6 of Figure 1.

### 8 Responses to Biserial Correlation

1. anitha says:

how to calculate y?

• Charles says:

Anitha,
The calculation is shown on the referenced webpage
y = NORM.S.DIST(NORM.S.INV(p0),FALSE) where p0 is as described on the webpage.
Charles

2. Tony says:

Thanks for the great toolkit! It has saved me a lot of time!

I am getting some strange values from the BCORREL function. e.g. one of the biserial correlations has come out as 17.232, which I checked and is correct against the formula supplied above. However, shouldn’t the value for r be between 0 and 1?

This is the data input into the formula:

m1 900.000
m0 0.035
n1 2.000
n0 8501.000
n 8503.000
s 13.929
p1 0.000
p0 1.000
z 3.497
y 0.001
r 17.232

• Tony says:

Sorry, I noticed the precision has caused some inaccuracies in the numbers I supplied. Here they are to five places:

m1 900.00000
m0 0.03529
n1 2.00000
n0 8501.00000
n 8503.00000
s 13.92878
p1 0.00024
p0 0.99976
z 3.49706
y 0.00088
r 17.23214

• Charles says:

Tony,
Yes, I thought that r should be between -1 and 1, although I have never checked to see whether this is always true, especially in extreme situations.
You should check the values for m0, m1, s.
You have a very extreme situation since you only have two ones out of 8,503 data elements. According to the following source, you shouldn’t use the biserial correlation when p0 > .9.
http://changingminds.org/explanations/research/analysis/biserial.htm
Charles

• Tony says:

Thanks for the response, Charles.

Yes, it is an extreme dataset. I appreciate the source. I will investigate further.

Tony

3. Sylvia says:

Hello Charles,

First of all, thank you for sharing all the material on Statistics, it has been very useful to me.

My question is, is there a way to use point-biserial correlation for multiple independent and dependent variables in Excel? (Like a “Multivariate multiple point-biserial correlation”) I have been looking for information, but I have only found “Multiple point-biserial correlation” using SPSS.

Thank you!

• Charles says:

Sylvia,
Point-biserial correlation is just a special case of the usual Pearson’s correlation. You can calculate Pearson’s correlation (and therefore point-biserial correlation) when there are multiple independent variables using regression. You can also calculate this value by using the Real Statistics function RSquare(Rx,Ry) where Rx is a range that contains the data for the independent variables and Ry is a range that contains the data for the dependent variable.
Charles