Two-way contingency tables
We now define an algorithm which lets us compare two-way tables which have different row and/or column totals. We explain the algorithm using the following example.
Example 1: Revise the table in range A5:D7 of the following figure so that the sum of each of the rows is the indicated row total and the sum of each of the columns is the indicated column total.
Figure 1 – Start of iterative proportional fitting procedure
In Figure 1 the range A5:D7 contains the initial table values. The range E5:E7 contains the target row totals and the range A8:D8 contains the target column totals.
In order for the procedure to work the sum of the target row totals must equal the sum of the target column totals. This is indeed the case for Example 1. The grand total (cell E8) contains the value 60, which is the sum of the target row totals as well as the target column totals; i.e. SUM(E5:E7) = SUM(A8:D8) = 60. We also assume that all table entries as well as target row/column totals are positive.
We begin the procedure (step 0) by putting the initial table values in the range G5:J7, but this time we include the real row and column totals (this is called the Seed in Figure 1).
Step 1 of the procedure consists of two parts. In the first part the values in the table are adjusted proportionally so that the row totals equal the targeted values (Row Adjustment). E.g. the formula in cell A12 is =G5*E5/K5.
In the second part the values in the table are adjusted proportionally so that the column totals equal the targeted values (Column Adjustment). E.g. the formula in cell G12 is =A12*A8/A15.
This same procedure is now repeated again and again until the sum and column totals are sufficiently close to the target values. For Example 1 this is achieved in 8 steps, as can be seen in Figure 2. In fact after 8 steps the row and column totals are exactly equal to the target values and so any further iterations will produce exactly the same table values.
Figure 2 – Iterative proportional fitting procedure for Example 1
The final result of the procedure is found in the lower right-hand corner of Figure 2.
Example 2: Use the iterative proportional fitting procedure to find the expected values for the data in Example 1 of Independence Testing assuming that a person’s level of schooling is independent of their parents’ wealth.
It turns out that if the row and column totals of the observed data are used as the targets and 1’s are used as the seed, the IPF procedure converges in 1 step to the expected values. This is depicted in the following figure.
Figure 3 – Iterative proportional fitting procedure for Example 2
These results agree with those found in Figure 1 of Independence Testing.
Three-way contingency tables
We now turn our attention to three-way tables. The procedure is similar. The main challenge is how to represent a three dimensional table in two dimensional space. We will use the representation shown in Figure 4.
Figure 4 – Three-way contingency table
Figure 4 contains a 3 × 2 × 4 table; i.e. 3 rows, 2 columns and 4 layers (we use the term layer for the third dimension). The area in blue (range C5:F10) contain the data for the table.
The areas in grey contain the marginal totals for any two dimensions: G5:G10 contain the totals for Rows × Columns, C13:F15 contain the totals for Rows × Layers and C17:F18 contain the totals for Columns x Layers. Thus, for example, cell E14 contains the total for the 2nd row and 3rd layer, and so has the value 10. It can be calculated in Excel using the formula SUMIF(A5:A10,A14,E5:E10).
The areas in pink contain the marginal totals for any single dimension: C11:F11 contains the totals for the layers, G13:G15 contains the totals for the rows and G17:G18 contains the totals for the columns. E.g. F11 contains the totals for the 4th layer and can be calculated via the formula SUM(F5:F10). Similarly, G14 contains the totals for the 2nd row and can be calculated via the formula SUM(C14:F14).
Finally cell G11 (shaded in green) contains the grand total, i.e. the sum of all the values in the table. This can be specified in Excel using any one of the following formulas: SUM(C5:F10), SUM(G5:G10) or SUM(C11:F11).
With two-way tables the only restriction on the target row and column totals is that the sum of the target row totals must equal the sum of the target column totals. The situation is more complicated with three-way totals. For these reason we will only consider targets that already come from a three-way table. This is not a real limitation since this is the only case we care about anyway. Once again we will demonstrate the key concepts by means of an example.
Example 3: Revise the 3-way table in Figure 4 (repeated on the left side of Figure 5) so that its row, column and layer totals are as described in the 3-way table on the right side of Figure 5.
Figure 5 – Specification of target marginal totals
Essentially we are attempting to insert the data from the table on the left into the table on the right while retaining all the totals for the table on the right. We now show the IPF procedure for accomplishing this (see Figure 6).
Figure 6 – Steps 0 and 1 of IPF procedure for Example 3
Step 1 of the procedure consists of three parts. In the first part the values in the table are adjusted proportionally so that the row × column totals meet the targeted values (Row/Column Adjustment). E.g. the formula in cell F25 is =V7*G7/W7.
In the second part the values in the table are adjusted proportionally so that the row × layer totals meet the targeted values (Row/Layer Adjustment). E.g. the formula in cell L28 is =D28*D16/D34.
In the third part the values in the table are adjusted proportionally so that the column × layer totals meet the targeted values (Column/Layer Adjustment). E.g. the formula in cell U29 is =M29*E19/M37.
This same procedure is now repeated again and again until all the marginal totals are sufficiently close to the target values. After 19 iterations we arrive at the solution shown in Figure 7.
Figure 7 – Solution to Example 3
Real Statistics Functions
Real Statistics Functions: The Real Statistics Resource Pack supplies the following array functions for implementing the IPF procedures.
IPFP2(R1): outputs the results of the IPFP algorithm for two-way contingency tables. R1 contains the input data and the target row and column totals, formatted as in range A5:E8 of Figure 1. If R1 is an m+1 × n+1 range then the output is an m × n range.
IPFP3(R1, R2): outputs the results of the IPFP algorithm for three-way contingency tables. R1 contains the input data, formatted as in range A5:F10 of Figure 5. R2 contains the targets, formatted as in range K5:N10 of that figure. The format of the output is the same as that of range R1.
Observation: For Example 1, the output from the formula =IPFP2(B5:F8) is shown in range H5:K7 of Figure 8.
Figure 8 – IPFP2 function
For Example 3, the output from the formula =IPFP3(A5:F10, K5:N10) is shown in range R5:W10 of Figure 9. None of the totals are included in the input or output ranges.
Figure 9 – IPFP3 function