Biserial Correlation

In Relationship between Correlation and t Test and Relationship between Correlation and Chi-square Test we introduced the point-serial correlation coefficient, which is simply the Pearson’s correlation coefficient when one of the samples is dichotomous.

The biserial correlation coefficient is also a correlation coefficient where one of the samples is measured as dichotomous, but where that sample is really normally distributed. In such cases, the point-serial correlation generally under-reports the true value of the association. The biserial correlation coefficient provides a better estimate in this case.

Assuming that we have two sets X = {x1, …, xn} and Y = {y1, …, yn} where the xi are 0 or 1, then the biserial correlation coefficient, denoted rb, is calculated as follows:

image128z

Where n0 = number of elements in X which are 0, n1 = the number of elements in X which are 1 (and so n = n0+n1), p0 = n0/n, p1 = n1/n, m0 = the mean of {yi: xi = 0}, m1 = the mean of {yi: xi = 1}, s is the standard deviation of Y and

y = NORM.S.DIST(NORM.S.INV(p0),FALSE)

Example 1: Calculate the biserial correlation coefficient for the data in columns A and B of Figure 1.

Biserial correlation coefficient

Figure 1 – Biserial Correlation Coefficient

The biserial correlation of -.06821 (cell J15) is calculated as shown in column L. Note that the value is a little more negative than the point-serial correlation (cell C4).

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack.

BCORREL(R1, R2) = the biserial correlation coefficient corresponding to the data in column ranges R1 and R2, where R1 is assumed to contain only 0’s and 1’s.

For biserial correlation coefficient for Example 1 can be calculated using the BCORREL function, as shown in cell G6 of Figure 1.

6 Responses to Biserial Correlation

  1. Tony says:

    Thanks for the great toolkit! It has saved me a lot of time!

    I am getting some strange values from the BCORREL function. e.g. one of the biserial correlations has come out as 17.232, which I checked and is correct against the formula supplied above. However, shouldn’t the value for r be between 0 and 1?

    This is the data input into the formula:

    m1 900.000
    m0 0.035
    n1 2.000
    n0 8501.000
    n 8503.000
    s 13.929
    p1 0.000
    p0 1.000
    z 3.497
    y 0.001
    r 17.232

    • Tony says:

      Sorry, I noticed the precision has caused some inaccuracies in the numbers I supplied. Here they are to five places:

      m1 900.00000
      m0 0.03529
      n1 2.00000
      n0 8501.00000
      n 8503.00000
      s 13.92878
      p1 0.00024
      p0 0.99976
      z 3.49706
      y 0.00088
      r 17.23214

  2. anitha says:

    how to calculate y?

Leave a Reply

Your email address will not be published. Required fields are marked *