Shapley-Owen Decomposition

When creating a multiple regression model, we would sometimes like to determine how much each independent variable contributes to the model. One way to do this is to decompose R-square using the Shapley-Owen Decomposition.

If x1, x2, …, xk represent the independent variable and y is the dependent variable, then the partial R-square for variable xj can be calculated by

Shapley-Owen

where V = {x1, x2, …, xk} and |T| = the number of elements in some subset T of V. Also R2(T) = the R-square value for the regression of the independent variables in T on y. We assume that R2(Ø) = 0.

To calculate the R_j^2 when k = 16, we need to calculate R2 for 216 = 65,536 regression models. This number goes up to 1,048,576 if k = 20. Thus the approach is practical only when the number of independent variables doesn’t get too large.

Example 1: Find the Shapley-Owen decomposition for the linear regression for the data in range A3:D8 of Figure 1.

Shapley-Owen decomposition

Figure 1 – Shapley-Owen Decomposition – part 1

We first calculate the R2 values of all subsets of {x1, x2, x3} on y, using the Real Statistics RSquare function. These values are shown in range G4:G11. We now apply the formula shown above for calculating R_j^2 for j = 1, 2, 3, as displayed in Figure 2.

Shapley-Owen decomposition 2

Figure 2 – Shapley-Owen Decomposition – part 2

E.g. to calculate R_1^2, we first note that the subsets of V – {x1} are ø, {x2}, {x3}, {x2,x3}. as shown in range K4:K7). When x1 is included, these become {x1}, {x1,x2}, {x1,x3}, {x1,x2,x3}, as shown in range L4:L7. The R-square values corresponding to these subsets of independent variables are shown in ranges M4:M7 and N4:N7. The pairwise differences between these values are shown in the range O4:O7. The weights in range P4:P7 correspond to the denominator terms in the Shapley-Owen formula, namely k · C(k–1,|T|), where k = 3. Finally, the values in range Q4:Q7 are the pairwise products of the two terms to the left (e.g. Q4 contains the formula =O4*P4) and the value of R_1^2 is then displayed in cell Q8 using the formula =SUM(Q4:Q7).

We see that R_1^2, R_2^2, R_3^2 are .404979, .291195, .28995, which sum to .986125 (cell Q22), which, as we see from cell G11 of Figure 1, is the R-square value of the original regression, thus we have the desired decomposition. We see that variable x1 contributes the most to the R-square value (.404979/.986125 = 41.1%) and x3 contributes the least.

Real Statistics Functions: The Real Statistics Resource Pack contains the following array function. Here R1 is an n × k  array containing the X sample data and R2 is an n × 1 array containing the Y sample data.

SHAPLEY(R1, R2): outputs an k × 1 column range containing the R_1^2, R_2^2, …, R_3^k values

For Example 1, the output from the formula SHAPLEY(A4:C8,D4:D8) is shown in range G13:G15 of Figure 1.

Leave a Reply

Your email address will not be published. Required fields are marked *