Best fit model for three-way contingency tables

Example 1: Now let us return to the problem we posed in Example 1 of Three-way Contingency Tables, namely to find the most parsimonious model that fits the data. Figure 1 summarizes the key results for all 18 models.

Log-linear models three

Figure 1 – All log-linear regression models

Figure 1 – Summary of all log-linear regression models

From Figure 1, we see that (CGT), (CG, GT, CT), (CG, GT), (CG, CT) and (T, CG) are the only models that fit the data, i.e. provide expected cell frequencies that are not significantly different from the observed cell frequencies. We now describe how to determine which of these models is the most parsimonious.

Observation: If model M1 has chi-square statistic (maximum likelihood) of χ12 with df1 degrees of freedom and model M2 has chi-square statistic (maximum likelihood) of χ22 with df2 degrees of freedom, then χ22 – χ12 ~ χ2 (df2 – df1).

Suppose M1 and M2 both provide a significant fit for the observed data and there is no significant difference between χ12 and χ22 then if M2 is a simpler model than M1 we could just as well use M2 to model our data than M1.

Example (continued): Clearly the saturated model (CGT) is a perfect fit for our data, but, using the above observation, we would like to find a simpler model (one with fewer terms) that is not significantly different from the saturated model.

Of the five models that fit the data, the next most complex model is (CG, GT, CT). The difference between the chi-square statistics for the two models is 1.11 – 0 = 1.1. Since the p-value for 1.11 with 2 – 0 = 2 degrees of freedom is .57 > .05, we conclude there is no significant difference between the two models. This indicates that the interaction between all the variables does not make a significant contribution.

We next compare (CG, GT, CT) with (CG, GT). The difference in the statistics is 7.86 – 1.11 = 6.75, which yields a p-value of .034 on 4 – 2 = 2 degrees of freedom. This is a significant difference, and so we don’t consider (CG, GT) further.

We next compare (CG, GT, CT) with (CG, CT). The difference in statistics is 7.86 – 1.11 = 6.75, which yields a p-value of .106 on 4 – 2 = 2 degrees of freedom. This is not a significant difference, and so we now adopt the simpler model (CG, CT). This indicates that the interaction between G (gender) and T (therapy) does not make a significant contribution.

There is only one other qualifying model to consider, namely (T, CG). The difference in statistics is now 12.04 – 7.86 = 4.18, which yields a p-value of .0398 on 4 – 2 = 2 degrees of freedom. This is a significant difference, and so we reject the (T, CG) model. This indicates that the interaction between C (cure) and T (therapy) does make a significant contribution.

We conclude that (CG, CT) as the simplest (i.e. most parsimonious) model that significantly fits the data.

This indicates that that there is an interaction between cure and gender as well as between cure and therapy. This can be seen by looking at the odds ratios of the observed data (see Figure 2).

Odds ratio Excel

Figure 2 – Odds ratios

Figure 2 – Odds ratios

That there is an interaction between Cure and Therapy can be seen from the fact that the odds of a cure for therapy 1 is 91/25 = 3.64, while that for therapy 2 is 79/45 = 1.76. The odds ratio is 2.07, i.e. therapy 2 seems to be twice as effective as therapy 1.

That there is an interaction between Cure and Gender can be seen from the fact that the odds of a cure for males is 221/38 = 5.82, while that for females is 136/105 = 1.30. The odds ratio is 4.49, i.e. the therapies seem to be much more effective for men than for women.

Finally, we calculate the coefficients of the log-linear model for (CG, CT).

image2361

using the following coding of the categorical variables:

tC = 1 if cured and = 0 otherwise
tG = 1 if male and = 0 otherwise
tT1 = 1 if therapy 1 and = 0 otherwise
tT2 = 1 if therapy 2 and = 0 otherwise

It is relatively easy to calculate the coefficients for this model, as described in Figure 3.

Coefficients best fit model

Figure 3 – Coefficients for (CG, CT) model

Figure 3 – Coefficients for (CG, CT) model

Alternatively we can use Excel’s regression data analysis tool using L5:L17 as the Y range and D5:J17 as the X range. The coefficients outputted are the same as those given in Figure 3. The rest of the output from the data analysis tool should be ignored.

Thus the log-linear model is

image2363

Leave a Reply

Your email address will not be published. Required fields are marked *