Example 1: Repeat the study from Example 3 of Finding Logistic Regression Coefficients using Newton’s Method based on the summary data shown in Figure 1.
Figure 1 – Data for Example 1
Using the Logistic Regression supplemental data analysis tool, selecting the Newton Method option, we obtain the output displayed in Figure 2.
Figure 2 – Base model for Example 1
We know from the above analysis that the presence of Temp and Water makes a significant difference (over the initial model where only the intercept is used), but do we need both of these independent variables? 1 = exp(0) doesn’t lie in the 95% confidence interval for Temp, but it does lie in the 95% confidence interval of Water. We conclude that Temp make a significant contribution to the model, but Water doesn’t. Since this analysis relies on the Wald statistic, which is not completely reliable, we would prefer to use an approach similar to that used in Testing Fit of the Logistic Regression Model.
Example 2: Do the Temp and Water variables make a significant difference in the model of Example 1?
We first create summary tables for the Temp-only and Water-only models and then use the Logistic Regression data analysis tool (with Newton’s Method option) to build the two models. Also see below for a simpler approach for creating the Temp-only summary table.
The summary table for the Temp model is shown in range B28:D34 of Figure 3 The values of the C and D columns can be calculated from the summary table of the base model (as shown in Figure 2) using SUMIF. For example, the number of samples where Temp = 20 and the reptile was born Male (cell C29) is given by the formula
By filling right (Ctrl-R) and down (Ctrl-D), you can copy this formula into the other cells in the range C29:D34. You now use the Logistic Regression tool to obtain the output shown in Figure 3.
Figure 3 – Output for Temp-only model
We observe that the Temp variable makes a significant contribution (cell U35) over the constant-only model. Here we are comparing LL1 (Temp model) with LL0 (constant-only model).
We can also compare the Temp model with the base model (Temp + Water), by copying the range T28:U35 to another location in the worksheet and using the LL1 value from the base model and substituting the LL1 value from the Temp model for LL0. Also we need to change df to 1 since the difference between the df of the two models is 2 – 1 = 1. This is shown in Figure 4.
Figure 4 – Comparing the Temp and base models
We see that there is not a significant difference between the models (cell X44). This confirms the conclusion that we reached previously that the Water variable is not making a significant contribution, and in fact it can be dropped.
We create the Water-only model in a similar way to obtain the output shown in Figure 5.
Figure 5 – Output for Water-only model
This time we see that there is no significant difference between the Water model and the constant model. If we repeat the analysis of Figure 4, we would see that there is a significant difference between the Water model and the base model.
Finally, we can look at further refinements of the model, such as the full interaction model, where we include the interaction between Temp and Water. We show this analysis in Figure 6.
Figure 6 – Logistic regression – Interaction model
If we compare this model with the base model using the approach described above (as in Figure 4), we get the output shown in Figure 7.
Figure 7 – Comparing the interaction and base models
This shows that there is a significant difference between the full interaction model and the base model, with the interaction model providing a better fit.
Observation: As mentioned above, there is a simpler way to create the Temp-only and Water-only summary data tables. To create the Temp-only table, enter Ctrl-m and select the Logistic Regression data analysis tool and then enter the following information into the dialog box that appears.
Figure 8 – Creating reduced models
Here we have entered the Water independent variable into the List of variables to exclude field. This produces the output in Figure 3.
Observation: The List of variables to exclude field can be used whenever the Input Format is set to Summary data and the Headings included with data field is checked in order to create a reduced model. The list of variables to exclude are entered into this field separated by commas.
E.g. if we have a summary data table with Nationality, Age, Education, Gender and Occupation as independent variables and want to create a reduced model with only Nationality, Education and Occupation, we would simply enter Age, Gender into the List of variables to exclude field.