Classification Table

Another way of evaluating the fit of a given logistic regression model is via a Classification Table. The Real Statistics Logistic Regression data analysis tool produces this table. For Example 1 of Comparing Logistic Regression Models the table produced is displayed on the right side of Figure 1.

Classification table logistic regression

Figure 1 – Classification Table

The table shows a comparison of the number of successes (y = 1) predicted by the logistic regression model compared to the number actually observed and similarly the number of failures (y = 0) predicted by the logistic regression model compared to the number actually observed.

We have four possible outcomes:

True Positives (TP) = the number of cases which were correctly classified to be positive, i.e. were predicted to be a success and were actually observed to be a success

False Positives (FP) = the number of cases which were incorrectly classified as positive, i.e. were predicted to be a success but were actually observed to be a failure

True Negatives (TN) = the number of cases which were correctly classified to be negative, i.e. were predicted to be a failure and were actually observed to be a failure

False Negatives (FN) = the number of cases which were incorrectly classified as negative, i.e. were predicted to be a negative but were actually observed to be a success

The Classification Table takes the form

Classification table elements

where PP = predicted positive = TP + FP, PN = predicted negative = FN + TN, OP = observed positive = TP + FN, ON = observed negative = FP + TN and Tot = the total sample size = TP + FP + FN + TN.

For the data in Figure 1, we have

TP = 483 (cell AE6), which can be calculated by the formula =SUMIF(L6:L17,”>=”&AE12,H6:H17)

FP = 201 (cell AF6), which can be calculated by the formula =SUMIF(L6:L17,”>=”&AE12,I6:I17)

FN = 39 (cell AE7), which can be calculated by the formula =H18-AE6

TN = 137 (cell AF7), which can be calculated by the formula =I18-AF6

Here cell AE12 contains the cutoff value of .5. Predicted values (in column L) greater than or equal to this value are classified as positive (i.e. predicted to be a success), those less than this value are classified as negative (i.e. predicted to be a failure). TP is simply the sum of all the value in column H whose predicted probabilities in column L are ≥ .5.

The cutoff value is specified in the Logistic Regression dialog box (see for example Figure 4 of Finding Logistic Regression Coefficients using Excel’s Solver).

Note that FP is the type I error and FN is the type II error described in Hypothesis Testing.

We now can define the following:

True Positive Rate (TPR), aka Sensitivity = TP/OP = 483/522 = .925287 (cell AE10)

True Negative Rate (TNR), aka Specificity = TN/ON = 137/338 = .405325 (cell AF10)

Accuracy (ACC) = (TP + TN)/Tot = (483 + 137) / 860 = .720930 (cell AG10)

False Positive Rate (FPR) = 1 – TNR = FP/ON = 201/338 = .594675

Positive Predictive Value (PPV) = TP/PP = 483/684 = .70614

Negative Predictive Value (NPV) = TN/PN = 137/176 = .77841

The overall accuracy of the logistic regression model is a measure of the fit of the model. For Example 1 this is .720930, which means that the model is estimated to give an accurate prediction 72% of the time.

Note that the accuracy of each outcome is given in column P of Figure 1. E.g. the accuracy of Temp = 21 and Water = 0 is 93.75% (cell P7), which can be calculated by the formula

=100*IF(L7>=$AE$12,H7/J7,I7/J7)

The total accuracy of the model (cell P18) can then be calculated by the formula

=SUMPRODUCT(P6:P17,J6:J17)/J18

The value of cell P18 is .720930, which is the same value we obtained in the classification table (cell AG10).

2 Responses to Classification Table

  1. Gilly says:

    Thank you so much for your help here! I’ve performed the logistic regression, but unfortunately the accuracy of my classification table/regression is very low for my false positives and false negatives. What can I do to improve this?

    Classification Table

    Suc-Obs Fail-Obs
    Suc-Pred 590 267 857
    Fail-Pred 82 157 239
    672 424 1096

    Accuracy 0.87797619 0.370283019 0.681569343

    Cutoff 0.5

    • Charles says:

      Gilly,
      The accuracy is not that bad, but in any case there really isn’t anything you can do, except use some other statistical model.
      Charles

Leave a Reply

Your email address will not be published. Required fields are marked *