Unfortunately, for larger values of coefficient b, the standard error and the Wald statistic become inflated, which increases the probability that b is viewed as not making a significant contribution to the model even when it does (i.e. a type II error).
To overcome this problem it is better to test on the basis of the log-likelihood statistic since
where df = k – k0 and where LL1 refers to the full log-likelihood model and LL0 refers to a model with fewer coefficients (especially the model with only the intercept b0 and no other coefficients). This is equivalent to
Observation: For ordinary regression the coefficient of determination
Thus R2 measures the percentage of variance explained by the regression model. We need a similar statistic for logistic regression. We define the following three pseudo-R2 statistics for logistic regression.
Definition 1: The log-linear ratio R2 (aka McFadden’s R2) is defined as follows:
where LL1 refers to the full log-likelihood model and LL0 refers to a model with fewer coefficients (especially the model with only the intercept b0 and no other coefficients).
Cox and Snell’s R2 is defined as
where n = the sample size.
Nagelkerke’s R2 is defined as
Observation: Since cannot achieve a value of 1, Nagelkerke’s R2 was developed to have properties more similar to the R2 statistic used in ordinary regression.
Observation: The initial value L0 of L, i.e. where we only include the intercept value b0, is given by
where n0 = number of observations with value 0, n1 = number of observations with value 1 and n = n0 + n1.
As described above, the likelihood-ratio test statistic equals:
where L1 is the maximized value of the likelihood function for the full model L1, while L0 is the maximized value of the likelihood function for the reduced model. The test statistic has chi-square distribution with df = k1 – k0, i.e. the number of parameters in the full model minus the number of parameters in the reduced model.
Example 1: Determine whether there is a significant difference in survival rate between the different values of rem in Example 1 of Basic Concepts of Logistic Regression. Also calculate the various pseudo-R2 statistics.
We are essentially comparing the logistic regression model with coefficient b to that of the model without coefficient b. We begin by calculating the L1 (the full model with b) and L0 (the reduced model without b).
Here L1 is found in cell M16 or T6 of Figure 6 of Finding Logistic Coefficients using Solver.
We now use the following test:
where df = 1. Since p-value = CHIDIST(280.246,1) = 6.7E-63 < .05 = α, we conclude that differences in rems yield a significant difference in survival.
The pseudo-R2 statistics are as follows:
All these values are reported by the Logistic Regression data analysis tool (see range S5:T16 of Figure 6 of Finding Logistic Coefficients using Solver).