When creating a multiple regression model, we would sometimes like to determine how much each independent variable contributes to the model. One way to do this is to decompose R-square using the Shapley-Owen Decomposition.

If *x*_{1},* x*_{2}, …, *x _{k}* represent the independent variable and y is the dependent variable, then the partial R-square for variable

*x*can be calculated by

_{j}where *V* = {*x*_{1},* x*_{2}, …, *x _{k}*} and |

*T*| = the number of elements in some subset

*T*of

*V*. Also

*R*

^{2}(

*T*) = the R-square value for the regression of the independent variables in

*T*on y. We assume that

*R*

^{2}(Ø) = 0.

To calculate the when *k* = 16, we need to calculate *R*^{2} for 2^{16} = 65,536 regression models. This number goes up to 1,048,576 if *k* = 20. Thus the approach is practical only when the number of independent variables doesn’t get too large.

**Example 1**: Find the Shapley-Owen decomposition for the linear regression for the data in range A3:D8 of Figure 1.

**Figure 1 – Shapley-Owen Decomposition – part 1**

We first calculate the *R*^{2} values of all subsets of {*x*_{1}, *x*_{2}, *x*_{3}} on y, using the Real Statistics RSquare function. These values are shown in range G4:G11. We now apply the formula shown above for calculating for* j* = 1, 2, 3, as displayed in Figure 2.

**Figure 2 – Shapley-Owen Decomposition – part 2**

E.g. to calculate , we first note that the subsets of *V* – {*x*_{1}} are ø, {*x*2}, {*x*_{3}}, {*x*2,*x*3}. as shown in range K4:K7). When *x*_{1} is included, these become {*x*1}, {*x*1,*x*2}, {*x*1,*x*3}, {*x*1,*x*2,*x*3}, as shown in range L4:L7. The R-square values corresponding to these subsets of independent variables are shown in ranges M4:M7 and N4:N7. The pairwise differences between these values are shown in the range O4:O7. The weights in range P4:P7 correspond to the denominator terms in the Shapley-Owen formula, namely *k* · *C*(*k*–1,|*T*|), where* k* = 3. Finally, the values in range Q4:Q7 are the pairwise products of the two terms to the left (e.g. Q4 contains the formula =O4*P4) and the value of is then displayed in cell Q8 using the formula =SUM(Q4:Q7).

We see that , , are .404979, .291195, .28995, which sum to .986125 (cell Q22), which, as we see from cell G11 of Figure 1, is the R-square value of the original regression, thus we have the desired decomposition. We see that variable* x*_{1} contributes the most to the R-square value (.404979/.986125 = 41.1%) and* x*_{3} contributes the least.

**Real Statistics Functions**: The Real Statistics Resource Pack contains the following array function. Here R1 is an *n** *×* **k* array containing the *X* sample data and R2 is an *n* × 1 array containing the *Y* sample data.

**SHAPLEY**(R1, R2): outputs an *k* × 1 column range containing the , , …, values

For Example 1, the output from the formula SHAPLEY(A4:C8,D4:D8) is shown in range G13:G15 of Figure 1.

Hi Charles,

When I use the SHAPLEY function I recive as output only one number in one cell but not ther full range (in your example, only G13, but not G14 and G15).

There is something that I do wrong?

Thanks

Ruben,

This is because SHAPLEY is an array function. To see how to get all the output, see the following webpage:

Array Formulas and Functions

Charles