In Multiple Correlation we explore correlations with three random variables. We now extend some of these results to more than three variables.
Correlation Coefficient and Coefficient of Determination
Here we summarize some of the results from Multiple Regression Analysis about the correlation coefficient and coefficient of determination for any number of variables. In particular, the coefficient of determination can be calculated by
The Real Statistics Resource Pack provides the array function RSquare(R1, R2) which computes this value where R1 is an n × k array containing the X sample data and R2 is an n × 1 array containing the Y sample data.
Thus the correlation coefficient can be calculated by the formula =SQRT(RSquare(R1, R2)).
Alternatively, the correlation coefficient and coefficient of determination can be calculated using either Excel’s Regression data analysis tool or the Real Statistics Linear Regression data analysis tool.
Observation: As mentioned in Multiple Regression Analysis, there is also a second form of the RSquare function in which RSquare(R1, j) = R2 where the X data consist of all the columns in R1 except the jth column and the Y data consist of the jth column of R1.
Thus if R1 is an n × k array consisting of all the X data followed by a column of Y data, then R2 = RSquare(R1, n).
In particular, if R1, R2 and R3 are the three columns of the n × 3 data range R, with R1 and R2 containing the samples for the independent variables x1 and x2 and R3 containing the sample data for dependent variable y, then =MCORREL(R3, R1, R2) yields the same result as =SQRT(RSquare(R, 3)).
Observation: It turns out that R2 = RSquare(R1, j) can also be calculated by first finding the correlation matrix for R1, then taking its inverse and then selecting the jth element on the diagonal. If this has the value c then the desired value of R2 is .
Thus R2 can also be calculated by the formula:
Partial Correlation Coefficient
Definition 1: Suppose that we have random variables x1, …, xk and for each xj we have a sample of size n. Now suppose that Z consists of all the random variables x1,…, xk excluding xi and xj where i ≠ j. Then the partial correlation coefficient between variables xi and xj is the correlation coefficient between xi and xj controlling for all the other variables (i.e. keeping all the variables in Z constant).
Observation: This is consistent with Definition 3 of Multiple Correlation where there are only three variables (i.e. k = 3). It turns out that the partial correlation coefficient can be calculated as described in the following property.
Property 1: Suppose that x1, …, xk and Z are as in Definition 1 and let X be the n × k matrix X = [xij]. The partial correlation coefficient between variables xi and xj where i ≠ j controlling for all the other variables is given by the formula
where the inverse of the correlation matrix R of X is R-1 = [pij].
Example 1: Calculate the partial correlation coefficient between Crime and Doctor controlling for Traffic Deaths and University based on the data in Figure 1 (which is a subset of the data for Example 1 of Multiple Correlation).
Figure 1 – US State Data
We now calculate the correlation matrix and inverse correlation for the data in Figure 1.
Figure 2 – Inverse of Correlation Matrix
The correlation matrix (range H4:K7) can be calculated as described in Multiple Regression Least Squares. Thus the inverse of the correlation matrix (range H11:K14) can be calculated via the formula =MINVERSE(CORR(B4:E18)).
Based on Property 1, the partial correlation coefficient between Crime and Doctor controlling for Traffic Deaths and University can be calculated by =–I11/SQRT(H11*I12), yielding a value of 0.0919913.
In general, the partial correlation coefficient between the ith and jth variables is given by the formula
=INDEX(-H11:K14/MMULT(M11:M14,TRANSPOSE(M11:M14)), i, j)
Or alternatively the following formula which is less resource intensive to calculate:
=INDEX(-H11:K14/(M11:M14*TRANSPOSE(M11:M14)), i, j)
Here the range M11:M14 represents the square root of the elements on the diagonal of the inverse of the correlation matrix. This can be calculated using the array formula
Or alternatively the array formula =SQRT(DIAG(H11:K14).
Observation: As described above, we can calculate R2C,DTU using any of the following formulas to get the value .103292:
Definition 2: Suppose that x1, …, xk and X are as described in Definition 1 and Property 1. Then the partial correlation matrix of X is the k × k matrix S = [sij] where for all i ≠ j
Example 2: Calculate the partial correlation matrix for the data in Figure 1.
The result is shown in Figure 3.
Figure 3 – Partial Correlation Matrix
The partial correlation matrix in range H19:K22 is calculated using the array formula
Observation: Note that Definition 2 does not define the values on the diagonal of the partial correlation matrix (i.e. where i = j). The formula above results in a matrix whose main diagonal consists of minus ones. If we prefer to have the main diagonal consist of all ones, we can use the following modified formula:
Real Statistics Function: The Real Statistics Resource Pack provides the following functions where R1 is an n × k matrix representing the samples of size n for the random variables .
PCORREL(R1, i, j) = the partial correlation coefficient of xi with xj based on the data in R1
PCORR(R1) is an array function which outputs the partial correlation matrix for the data in R1
Observation: For the data in Figure 1, PCORREL(B4:E18, 1, 2) = 0.0919913 and PCORR(B4:E18) is the range H19:K22 of Figure 3 (except that the main diagonal consists only of ones).