# Testing the Fit of the Logistic Regression Model

Unfortunately, for larger values of coefficient b, the standard error and the Wald statistic become inflated, which increases the probability that b is viewed as not making a significant contribution to the model even when it does (i.e. a type II error).

To overcome this problem it is better to test on the basis of the log-likelihood statistic since

where df = k – k0 and where LL1 refers to the full log-likelihood model and LL0 refers to a model with fewer coefficients (especially the model with only the intercept b0 and no other coefficients). This is equivalent to

Observation: For ordinary regression the coefficient of determination

Thus R2 measures the percentage of variance explained by the regression model. We need a similar statistic for logistic regression. We define the following three pseudo-R2 statistics for logistic regression.

Definition 1The log-linear ratio R2 (aka McFadden’s R2) is defined as follows:

where LL1 refers to the full log-likelihood model and LL0 refers to a model with fewer coefficients (especially the model with only the intercept b0 and no other coefficients).

Cox and Snell’s R2  is defined as

where n = the sample size.

Nagelkerke’s R2 is defined as

Observation: Since $R_{CS}^2$ cannot achieve a value of 1, Nagelkerke’s R2 was developed to have properties more similar to the R2 statistic used in ordinary regression.

Observation: The initial value L0 of L, i.e. where we only include the intercept value b0, is given by

where n0 = number of observations with value 0, n1 = number of observations with value 1 and n = n0 + n1.

As described above, the likelihood-ratio test statistic equals:

where L1 is the maximized value of the likelihood function for the full model L1, while L0 is the maximized value of the likelihood function for the reduced model. The test statistic has chi-square distribution with df = k1 – k0, i.e. the number of parameters in the full model minus the number of parameters in the reduced model.

Example 1: Determine whether there is a significant difference in survival rate between the different values of rem in Example 1 of Basic Concepts of Logistic Regression. Also calculate the various pseudo-R2 statistics.

We are essentially comparing the logistic regression model with coefficient b to that of the model without coefficient b. We begin by calculating the L1 (the full model with b) and L0 (the reduced model without b).

Here L1 is found in cell M16 or T6 of Figure 6 of Finding Logistic Coefficients using Solver.

We now use the following test:

where df = 1. Since p-value = CHIDIST(280.246,1) = 6.7E-63 < .05 = α, we conclude that differences in rems yield a significant difference in survival.

The pseudo-R2 statistics are as follows:

All these values are reported by the Logistic Regression data analysis tool (see range S5:T16 of Figure 6 of Finding Logistic Coefficients using Solver).

### 9 Responses to Testing the Fit of the Logistic Regression Model

1. Mike says:

One typo:
CHITEST(280.246,1) = 6.7E-63 => CHIDIST(280.246,1) = 6.7E-63

• Charles says:

Mike,
Thanks for catching this. I have now made the correction.
Charles

2. Wytek Szymanski says:

Hi Charles,

The R-squared in linear regression is defined like so:
var(Y) = (var(Y)-var(err))/var(Y) = 1 – var(err)/var(Y)
where var(err) is derived from the absolute difference between Y and Yhat.

Why can’t we apply this definition to logistic regression where Y is the observed probability and Yhat is the estimated probability?

• Charles says:

Wytek,
Sorry, but I have not tried to evaluate this version of R-square for logistic regression. From what I can see no one uses it. Instead they use pseudo-R-square statistics, some of which are described on my website.
Charles

3. shri says:

Hi Charles,

Is there any post where the Binary logistic regression output has been interpreted. As in what does the output mean and what conclusion actions can be derived from the same.

Shri

4. Amy says:

Given your figure 6 output are the following statements a correct interpretation?

The results of the likelihood ratio test suggest there was statistically significant relationship between the input variable and the outcome variable at the 0.05 level of significance (chi sq (1, N=760)= 280.2421, p=6.65E-63).

The odds ratio of the input was .9928(=exp(-0.00722)) with a 95% confidence interval=(.9917,9939). This indicated that every every unit …. increased/decrease in the input variable the odds of the output variable increased/decreased by 0.9928

My understanding of your data set is weak so I’m not sure how to interpret that.

My data is pretest score and output is pass/fail class. The logisitic regression ran nicely and my model is significant.

• Charles says:

Amy,
Yes, this seems correct.
Charles

5. Basil Chyke Okoye says:

De

Thank you very much for the good work you are doing.
Please sir, how can i run logistic regression without intercept with real stats resource add in’

Thanks

Chyke

• Charles says:

Basil,
The Real Statistics Resource Pack currently doesn’t support a no intercept option for Logistic Regression. I can suggest the following workaround though. Run the Logistic Regression data analysis tool and choose the Solver option. Now manually insert 0 in the intercept cell; i.e. the first coefficient under the heading Coeff. Note this is the cell that previously contained a constant value (not a formula). Next select Data > Analysis|Solver. The values in the dialog box will be those created by the Logistic Regression data analysis tool. You need to make one change and then press the OK button, namely change the range in the field called “By Changing Variable Cells” by removing the intercept cell (e.g. if the range was R4:R7, then change this range to R5:R7; here cell R4 was the cell that you changed to zero previously). I hope this helps.
Charles