Classification Table | Real Statistics Using Excel

Introduction

Another way of evaluating the fit of a given logistic regression model is via a Classification Table. The Real Statistics Logistic Regression data analysis tool produces this table. For Example 1 of Comparing Logistic Regression Models the table produced is displayed on the right side of Figure 1.

Figure 1 – Classification Table

The table shows a comparison of the number of successes (y = 1) predicted by the logistic regression model compared to the number actually observed and similarly the number of failures (y = 0) predicted by the logistic regression model compared to the number actually observed.

Elements of the Table

We have four possible outcomes:

True Positives (TP) = the number of cases that were correctly classified to be positive, i.e. were predicted to be a success and were actually observed to be a success

False Positives (FP) = the number of cases that were incorrectly classified as positive, i.e. were predicted to be a success but were actually observed to be a failure

True Negatives (TN) = the number of cases that were correctly classified to be negative, i.e. were predicted to be a failure and were actually observed to be a failure

False Negatives (FN) = the number of cases that were incorrectly classified as negative, i.e. were predicted to be a failure but were actually observed to be a success

The Classification Table takes the form

where PP = predicted positive = TP + FP, PN = predicted negative = FN + TN, OP = observed positive = TP + FN, ON = observed negative = FP + TN and Tot = the total sample size = TP + FP + FN + TN.

Example

For the data in Figure 1, we have

TP = 483 (cell AE6), which can be calculated by the formula =SUMIF(L6:L17,”>=”&AE12,H6:H17)

FP = 201 (cell AF6), which can be calculated by the formula =SUMIF(L6:L17,”>=”&AE12,I6:I17)

FN = 39 (cell AE7), which can be calculated by the formula =H18-AE6

TN = 137 (cell AF7), which can be calculated by the formula =I18-AF6

Here cell AE12 contains the cutoff value of .5. Predicted values (in column L) greater than or equal to this value are classified as positive (i.e. predicted to be a success), those less than this value are classified as negative (i.e. predicted to be a failure). TP is simply the sum of all the values in column H whose predicted probabilities in column L are ≥ .5.

The cutoff value is specified in the Logistic Regression dialog box (see for example Figure 4 of Finding Logistic Regression Coefficients using Excel’s Solver).

Note that FP is the type I error and FN is the type II error described in Hypothesis Testing.

Other Concepts

We now can define the following:

True Positive Rate (TPR), aka Sensitivity = TP/OP = 483/522 = .925287 (cell AE10)

True Negative Rate (TNR), aka Specificity = TN/ON = 137/338 = .405325 (cell AF10)

Accuracy (ACC) = (TP + TN)/Tot = (483 + 137) / 860 = .720930 (cell AG10)

False Positive Rate (FPR) = 1 – TNR = FP/ON = 201/338 = .594675

Positive Predictive Value (PPV) = TP/PP = 483/684 = .70614

Negative Predictive Value (NPV) = TN/PN = 137/176 = .77841

The overall accuracy of the logistic regression model is a measure of the fit of the model. For Example 1 this is .720930, which means that the model is estimated to give an accurate prediction 72% of the time.

Note that the accuracy of each outcome is given in column P of Figure 1. E.g. the accuracy of Temp = 21 and Water = 0 is 93.75% (cell P7), which can be calculated by the formula

=100*IF(L7>=$AE$12,H7/J7,I7/J7)

The total accuracy of the model (cell P18) can then be calculated by the formula

=SUMPRODUCT(P6:P17,J6:J17)/J18

The value of cell P18 is .720930, which is the same value we obtained in the classification table (cell AG10).

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Kaggle (2023) Classification table – confusion matrix. Logistic regression using R
https://www.kaggle.com/code/benroshan/part-8-logistic-regression-using-r

MedCalc (2023) ROC curves
https://www.medcalc.org/manual/roc-curves.php

15 thoughts on “Classification Table”

Chloue A. Ebin

July 9, 2021 at 4:00 pm

Using the results in the Classification table-overall percentage, how high should it be to make the model considered as an efficient or good model?
- Charles
  
  July 11, 2021 at 5:28 pm
  
  Hello Chloue,
  I don’t know of any specific guidelines for what level of accuracy makes for a good model. It all depends on the scenario that you are modeling. Clearly accuracy of 100% is perfection and an accuracy of 50% just means the model is no better than flipping a coin. Obviously, somewhere in between is what you are aiming for, but there is no specific number.
  Charles
Daniel Debbarma

June 30, 2020 at 6:39 am

Sir
I have both training data and test data.
how do i implement classification accuracy based on test data of a model derived with training data ?
- Charles
  
  June 30, 2020 at 2:52 pm
  
  Hello Daniel,
  Suppose that you have data for logistic regression. Build your model using the training data. Then test your model using the test data. Now for the test data you have both the predicted y values (from the model) and the observed values from the test data, and so you can build the classification table.
  Charles
Lauren

April 6, 2019 at 11:02 pm

This was the best thing i’ve ever read.
If i pass my dissertation it is solely down to this page
- Charles
  
  April 7, 2019 at 8:30 am
  
  Hi Lauren,
  Thank you very much. Best of luck on your dissertation.
  Charles
abdo

February 15, 2019 at 8:30 am

thank you I appreciate you!
Heather

November 15, 2018 at 6:41 pm

Suppose the average success rate in the sample is .8, meaning 80% of the observations show the value of 1 for the dependent variable. How does this affect the choice of the cut-off value, say using ,8 rather than .5?
- Charles
  
  November 15, 2018 at 9:53 pm
  
  Heather,
  You can choose any cut-off you like.
  Charles
Dinesh Motkar

October 30, 2017 at 1:02 pm

I have two variables, Standard and Test i need ROC curve for these two variable. For that i want Sensitivity and (1-Specificity) on various cutoff. Can you please tell me how to calculate cutoff value?

Standard and Test both are binary variables.
I need cutoff points, please tell me how to calculated cutoff values!
- Charles
  
  October 30, 2017 at 5:54 pm
  
  Dinesh,
  The cutoff values are user-defined. You can select whatever value serves your purposes.
  Charles
Sandipan Karmakar

August 3, 2017 at 6:24 pm

How to perform ROC analysis for significant predictors in Linear Regression? Can you give any idea???
- Charles
  
  August 3, 2017 at 6:31 pm
  
  Can you specify more precisely, what ROC analysis you want to do? For binary logistic regression this is clear; it is less clear for linear regression.
  Charles
Gilly

November 7, 2016 at 4:23 pm

Thank you so much for your help here! I’ve performed the logistic regression, but unfortunately the accuracy of my classification table/regression is very low for my false positives and false negatives. What can I do to improve this?

Classification Table

Suc-Obs Fail-Obs
Suc-Pred 590 267 857
Fail-Pred 82 157 239
672 424 1096

Accuracy 0.87797619 0.370283019 0.681569343

Cutoff 0.5
- Charles
  
  November 7, 2016 at 5:01 pm
  
  Gilly,
  The accuracy is not that bad, but in any case there really isn’t anything you can do, except use some other statistical model.
  Charles