Definition 1: For any coefficient b the Wald statistic is given by the formula
Observation: For ordinary regression we can calculate a statistic t ~ T(dfRes) which can be used to test the hypothesis that a coordinate b = 0. The Wald statistic is approximately normal and so it can be used to test whether the coefficient b = 0 in logistic regression.
Since the Wald statistic is approximately normal, by Theorem 1 of Chi-Square Distribution, Wald2 is approximately chi-square, and, in fact, Wald2 ~ χ2(df) where df = k – k0 and k = the number of parameters (i.e. the number of coefficients) in the model (the full model) and k0 = the number of parameters in a reduced model (esp. the baseline model which doesn’t use any of the variables, only the intercept).
where X is the r × (k+1) design matrix (as described in Definition 3 of Least Squares Method for Multiple Regression)
and V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi), where ni = the number of observations in group i and pi = the probability of success predicted by the model for elements in group i. Groups correspond to the rows of matrix X and consist of the various combinations of values of the independent variables.
Note that S = (XTW)-1 where W is X with each element in the ith row of X multiplied by vii.
Observation: The standard errors of the logistic regression coefficients consist of the square root of the entries on the diagonal of the covariance matrix in Property 1.
Example 1 (Coefficients): We now turn our attention to the coefficient table given in range E18:L20 of Figure 6 of Finding Logistic Regression Coefficients using Solver (repeated in Figure 1 below).
Figure 1 – Output from Logistic Regression tool
Using Property 1 we calculate the correlation matrix S (range V6:W7) for the coefficient matrix B via the the formula
Actually, for computational reasons it is better to use the following equivalent array formula:
The formulas used to calculate the values for the Rems coefficient (row 20) are given in Figure 2.
Figure 2 – Formulas for Logistic Regression coefficients
Note that Wald represents the Wald2 statistic and that lower and upper represent the 100-α/2 % confidence interval of exp(b). Since 1 = exp(0) is not in the confidence interval (.991743, .993871), the Rem coefficient b is significantly different from 0 and should therefore be retained in the model.
Observation: The % Correct statistic (cell N16 of Figure 1) is another way to gauge the fit of the model to the observed data. The statistic says that 76.8% of the observed cases are predicted accurately by the model. This statistic is calculated as follows:
For any observed values of the independent variables, when the predicted value of p is greater than or equal to .5 (viewed as predicting success) then the % correct is equal to the value of the observed number of successes divided by the total number of observations (for those values of the independent variables). When p < .5 (viewed as predicting failure) then the % correct is equal to the value of the observed number of successes divided by the total number of observations. These values are weighted by the number of observations of that type and then summed to provide the % correct statistic for all the data.
For example, for the case where Rem = 450, p-Pred = .774 (cell J10), which predicts success (i.e. survived). Thus the % Correct for Rem = 450 is 85/108 = 78.7% (cell N10). The weighted sum (found in cell N16) of all these cells is then calculated by the formula =SUMPRODUCT(N6:N15,H6:H15)/H16.