# Significance Testing of the Logistic Regression Coefficients

Definition 1: For any coefficient b the Wald statistic is given by the formula

Observation: For ordinary regression we can calculate a statistic t ~ T(dfRes) which can be used to test the hypothesis that a coordinate b = 0. The Wald statistic is approximately normal and so it can be used to test whether the coefficient b = 0 in logistic regression.

Since the Wald statistic is approximately normal, by Theorem 1 of Chi-Square Distribution, Wald2 is approximately chi-square, and, in fact, Wald2 ~ χ2(df) where df = k – k0 and k = the number of parameters (i.e. the number of coefficients) in the model (the full model) and k0 = the number of parameters in a reduced model (esp. the baseline model which doesn’t use any of the variables, only the intercept).

Property 1: The covariance matrix S for the coefficient matrix B is given by the matrix formula

where X is the r × (k+1) design matrix (as described in Definition 3 of Least Squares Method for Multiple Regression)

and V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi), where ni = the number of observations in group i and pi = the probability of success predicted by the model for elements in group i. Groups correspond to the rows of matrix X and consist of the various combinations of values of the independent variables.

Note that S = (XTW)-1 where W is X with each element in the ith row of X multiplied by vii.

Observation: The standard errors of the logistic regression coefficients consist of the square root of the entries on the diagonal of the covariance matrix in Property 1.

Example 1 (Coefficients): We now turn our attention to the coefficient table given in range E18:L20 of Figure 6 of Finding Logistic Regression Coefficients using Solver (repeated in Figure 1 below).

Figure 1 – Output from Logistic Regression tool

Using Property 1 we calculate the correlation matrix S (range V6:W7) for the coefficient matrix B via the the formula

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)), MMULT(DIAGONAL(J6:J15*(1-J6:J15)*H6:H15),DESIGN(E6:E15))))

Actually, for computational reasons it is better to use the following equivalent array formula:

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)),J6:J15*                                           (1-J6:J15)*H6:H15*DESIGN(E6:E15)))

The formulas used to calculate the values for the Rems coefficient (row 20) are given in Figure 2.

Figure 2 – Formulas for Logistic Regression coefficients

Note that Wald represents the Wald2 statistic and that lower and upper represent the 100-α/2 % confidence interval of exp(b). Since 1 = exp(0) is not in the confidence interval (.991743, .993871), the Rem coefficient b is significantly different from 0 and should therefore be retained in the model.

Observation: The % Correct statistic (cell N16 of Figure 1) is another way to gauge the fit of the model to the observed data. The statistic says that 76.8% of the observed cases are predicted accurately by the model. This statistic is calculated as follows:

For any observed values of the independent variables, when the predicted value of p is greater than or equal to .5 (viewed as predicting success) then the % correct is equal to the value of the observed number of successes divided by the total number of observations (for those values of the independent variables). When p < .5 (viewed as predicting failure) then the % correct is equal to the value of the observed number of successes divided by the total number of observations. These values are weighted by the number of observations of that type and then summed to provide the % correct statistic for all the data.

For example, for the case where Rem = 450, p-Pred = .774 (cell J10), which predicts success (i.e. survived). Thus the % Correct for Rem = 450 is 85/108 = 78.7% (cell N10). The weighted sum (found in cell N16) of all these cells is then calculated by the formula =SUMPRODUCT(N6:N15,H6:H15)/H16.

### 51 Responses to Significance Testing of the Logistic Regression Coefficients

1. Mark Harmon says:

Hi Charles,
Excellent work! I was wondering if you wouldn’t mind providing a bit more clarity to the calculation of the standard errors of the logistic regression coefficients. Any chance you could show the actually matrix work that had to be done? You’ve listed the basic formulas but it’s not clear (to me anyway). I’ve looked everywhere on the Internet and there is no specific documentation on how to construct the covariance matrix of the logistic regression coefficients. You did it with a supplemental function you created. I am hoping to get the s.e. of those coefficients so I can manually calculate the Wald statistic for each coefficient. Unbelievably, there is zero documentation on the Internet on how to do that. A huge thanks for any help in advance.

• Charles says:

Mark,
Thanks for your comment. The standard errors are the square roots of the values on the main diagonal of the covariance matrix. In the next day or two I will update the website with a better description of how to calculate the covariance matrix.
Charles

Update 20 Aug 2013: The site has now been updated with the Excel formula I used to calculate the covariance matrix of B. Charles

2. Mark Harmon says:

Hi Charles,
Thanks for the quick reply and I really enjoy your fantastic site. I did have a question regarding your answer to my original question. Your answer above mentioned the following function DESIGN(E6:E15). I wasn’t able to find any documentation about that function. I was wondering if you might have any more information on it or a workaround (it doesn’t appear to be a function in Excel 2010)? Thanks in advance.

3. Mark Harmon says:

Hi Charles,

Thanks for the info. I was able to work it out (I haven’t messed around with matrices since I was an undergrad engineering major in the 80’s). I had another quick question regarding the creation of the covariance matrix:

The Design matrix (X) and the Diagonal Variance matrix (V) are created in your example with all of the data records sorted according to P(X) (p-pred) being in descending order.

It appears to be necessary to sort the data records according to P(X) (p – pred) in descending order before creating the X and V matrices.

If data records are not sorted according to P(X) in descending order at the beginning of the calculations, the resulting X and V matrices will produce a very different (and apparently incorrect) covariance matrix.

I was just wondering if you would agree that data records must be sorted according to P(X) in descending order at the start of these calculations in order to obtain the correct covariance matrix?

4. Kril Pickrell says:

Charles,

Thanks so much for your website. It is the only place that remotely comes close to explaining how exactly to calculate the standard error of regression coefficients. I do have a couple of simple questions:

“…and V = [vi] is the r × r matrix where vi = ni pi (1-pi).”

Should this read “is the r x 1 matrix”? If not, how is each i,j computed?

Also, could you explain what the ni term is? Thanks in advance.

Best Regards,
Kris Pickrell

• Charles says:

Hi Kril,

Thanks for catching some sloppy notation on my part. The correct expression is that “V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi).” I have updated the webpage to reflect this.

What I wrote would be correct with V = [vi] as an r × 1 matrix with vi = ni pi (1–pi), but with VX in the expression S = (XTVX)-1 being scalar multiplication rather than matrix multiplication.

ni = the number of observations for group i, where group i corresponds to the ith row of matrix X and consist of one of the various combinations of values of the independent variables (actually the ith such combination).

Charles

• Kris Pickrell says:

Thanks!

5. bgkt sih says:

Dear sir,

What is the significance of using value 1 at the 1st column of matrix X?

• Charles says:

Ones in the first column of the design matrix X is the way of handling the constant terms.
Charles

• bgkt sih says:

If I want to use it for any data. I must put the value 1 is it ?

• Charles says:

Yes you need to include the 1’s.
Charles

Dear sir,

I have done logistic regression for 20 independent variables for which all of them are categorical (0 and 1) also 1 binomial response variable. However, the significant test using p-value do not seems right with the variables.
Does using all categorical variable as independent variable effects the result?

• Charles says:

The usual logistic regression model doesn’t seem like the correct approach. From what I can tell you need something like a 20-dimension contingency table using log-linear model (see http://www.real-statistics.com/log-linear-regression/). I am not sure how to handle such a problem. Perhaps someone else can make a suggestion.
Charles

7. margaluz arias says:

Hello Charles
Could you define what is group i in “property 1”?
I would love to know which parameters did you choose to build the covariance matrix.
In my logistic regression model I only have 2 variables so I will do the covariance matrix by using covar functions.

Thanks

• Charles says:

In this context each group consists of any combinations of values of the independent variables. So if you have independent variables Gender and Age and the raw data is

M 30 1
F 31 0
M 30 0
M 32 1
F 32 0
F 31 1
F 30 1
M 32 1

There are 8 sample elements (rows), but some of them can be grouped together, namely the ones where the gender and age are the same. This yields the following summary data (a sort of frequency table). The summary contains 5 groups The 3rd column is a count of all the cases that have a 0 as the dependent variable and the 4th column is a count of all the cases that have a 1 as the dependent variable.

M 30 1 1
F 31 1 1
M 32 0 2
F 32 1 0
F 30 0 1

This is a silly example, but I hope it helps answer your question.

Charles

8. margaluz arias says:

Hello Charles
Thank you very much for the answer.
I think I can understand a bit better how you did the covariance matrix.

However, all my independent variables are continuous (no repetitions), so that, could be possible to form the groups in function of the probability (as for Hosmer lemeshow statistics)?.
That means to group the cases that are predicted with probability <0.1, <0.2…..<1

• Charles says:

No problem. Each summary data row will be equivalent to one raw data row. The reported Hosmer Lemeshow value won’t be quite since it is typically based on 10 summarized observations, but that is probably not so important.

Charles

9. Marty says:

Hi Charles,

Fantastic website. Thank you, it’s been very helpful.

I have a binomial logistic regression with 10 independent variables. HL test and R2 indicate the model is a poor fit. I’d like to try to improve the fit by removing variables that have low Wald scores and add in variable interactions. I saw the commentary on creating interactions for multi variable linear regression, but I am not sure if I can copy exactly, or if I need to make further adjustments due to logistic function.

1.) Is there an easy way to add in interactions?
2.) Is there a single best test to use to decide if the model is a good fit? — high AUC, use one of the R2’s, or the HL test?

Thanks again!
Marty

• Charles says:

Marty,

1. You can add in interaction of independent variables in exactly the same way as you do for multiple linear regression. I have simply implemented this via x1 * x2, which is easy to do in Excel.

2. There isn’t a simple answer to this question, although I wouldn’t rely too heavily on the HL value. High AUC and R2 are likely to be better indicators.

Charles

• Marty says:

Thanks Charles!

One more question, is the de facto R2 “floor” of a binomial logistic regression .50?

If I didn’t use a model and just “guessed”, it seems like I’d have a 50/50 chance of predicting the actual outcome.

Therefore, if my model yields an R2 of .56, does that mean that the model only offers an .06 improvement of what I would have been able to achieve using guesswork alone?

Thanks again,
Marty

• Charles says:

Marty,

Your remarks are true for the % Correct statistic, but not for the R2 statistic.

R2 is calculated in a completely different way, and your remarks are not true for R2.

Charles

10. Kone says:

Dear Charles
Thank you for your help. I have to run the variables temperature treatments on three groups of 10 plants. i would like to use Anova one-way for variance analysis.
I would like to know if it is the right analysis when i use Anova repeated measures

• Charles says:

Sorry, but you need to provide more information before I am able to answer your question.
Charles

Dear Charles,

Alas I am a babe in the logit world and I hope you will be kind enough to point me in the right direction. I have access to a large dataset on student scores that have been previously standardised along the lines of mean 25, s.d. 5. I wish to perform some significance testing between certain groups of students and have struck on the idea that I could/should convert these scores to logit’s using the probability of achieving each ‘raw’ score – essentially treating them as z-scores – and then use these logits in place of the raw standardised scores and conduct z-tests. I am (if it isn’t already painfully obvious) too statistically underskilled to know whether I am committing an egregious blunder with such a plan, but the reference to Wald in your article makes me think that I probably am… Any suggestions on how I might better understand the issue would be very welcome.

• Charles says:

It is not clear to me what advantage (if any) you get by converting the scores to logit’s. Why can’t you simply use the raw scores?
Charles

12. Sankit Maroo says:

How can i reduce the p value of my intercept coefficient if the p value of all my other variables are satisfactorily low in logistic regtession .

• Charles says:

Sankit,
Besides changing your data (e.g. via a transformation), I don’t know how to do this. I also don’t know why you would want to do this.
Charles

13. Renato says:

Hello

A question about the Wald test. You can determine the Wald test for linear regression? The information I find is used for logistic regression. I want to rule out if I can use it for a linear regression. Gtacias

• Charles says:

Is there any particular reason why you want to use the Wald test for linear regression?
Charles

14. Matt says:

Hi Charles,
I’m still having trouble understanding the meaning of the p value and statistical significance in logistic regression. I want to know how significant are the coefficients. What does it exactly mean that it is statistically significant? From my basic understanding if the p value is below the cutoff point, i.e<0.05 then the variable is statistically significant right? and also, Since there isn't a normal distribution in logistic regression how reliable is the p value?

cheers,
Matt

• Charles says:

Matt,
Yes, if the p value is below the cutoff point alpha (e.g. alpha = 0.05) then the variable is statistically significant. This means the (population) coefficient for that variable can be considered to be non-zero (i.e. that variable has a significant impact on the model). Although “there isn’t a normal distribution in logistic regression”, the distribution of the coefficients is normal.
Charles

15. Shashank jain says:

Hi,

Thanks for creating this great tool and website.
I ran a logistic regression using the tool on on Mac using the data set as described in the video (https://www.youtube.com/watch?v=EKRjDurXau0) . But I am not getting the p-value table (as can be seen in the screenshot in the webpage above) for all the coefficients to determine the significance of each independent variable. What can be done ?

• Charles says:

Shashank,
If you send me an Excel file with your data and results, I will try to figure out what is going on. You can find my email address at Contact Us.
Charles

• Shashank jain says:

Hi Charles,

Appreciate the quick response. My problem got solved.

I had large no. of rows so I was not able to locate the table.

thanks

Regards
Shashank

16. Anson says:

Hi Charles,

I have use Multinomial Logistic Regression data analysis tool. The exp(b) of coeff int the report is > lower and <upper, but the p-value of coeff is 0.8455, so this coeff retained in the model or not?

Thanks
Anson

• Charles says:

If the p-value is .8455, then the coeff is not significantly different from zero.
Charles

• Anson says:

Charles,

I have 50 columns of Independent Variables and each column have about 30000 data, only one column for the dependent variables.

So how to calculate each the coeff is significantly different from zero and should therefore be retained in the model?

Anson

• Charles says:

Anson,
If p-value < alpha then the coefficient is significantly different from zero. Equivalently, if 1 is not in the confidence interval then the coefficient is significantly different from zero. Charles

• Anson says:

Charles,

I have a question for using
MLogitPred(R0, R1, r, iter) function.

I have 20 col x 1001 row raw data with heading. Column A: ref no.
Columns B to E: independent variables
Columns F to T : dependent variables

I have change the value “r” to predict and the result as below:
E2:S2=MLogitPred(A2:D2,ChangeData!\$B\$2:\$T\$1001,13), Result=0.04
E2:S2=MLogitPred(A2:D2,ChangeData!\$B\$2:\$T\$1001,0), Result=#VALUE
E2:S2=MLogitPred(A2:D2,ChangeData!\$B\$2:\$T\$1001,4), Result=#VALUE

So is that mean I use 13 in the r is right?

Anson

• Charles says:

Anson,
You saw your data is in raw format, but you also say that columns F to T contain the dependent variables. Both of these can’t be true. If the data is in raw format then there would only be one column for the dependent variables. I will therefor assume that your data is in summary format. Since columns F to T contain the dependent variables and there are 15 columns from F to T, this would mean that your dependent variables are numbered 0, 1, 2, …, 14. Thus r should be 14.
Charles

• Leo says:

Dear Sir,technically speaking,does it mean that,the predictors with p>0.05 are useless and we don have something to report,what if in cases when all predictors are insignificant?
Very kindly

• Charles says:

Leo,
Assuming that alpha = .05 is the correct significance level, then variables that have p > .05 are not making a significant contribution. But things are never this simple since for example when you have three such variables removing two of them may result in the third being significant.
Charles

17. Meryem says:

Hi,
Please, how can we determine the magnitude of variation from Wald test of G*E analysis using Genstat software.
Thank you

• Charles says:

Meryem,
Sorry, but I am not familiar with Genstat.
Charles

18. Alisa says:

Thanks for finally writing about >Significance Testing of the Regression Coefficients | Real Statistics Using Excel <Loved it! http://leftlanedriver.com/groups/car-games-secrets-revealed/

19. liana says:

Dear Sir,
I am still confused, if I use real statistics, it will instantly found the overall output value. then what if without using real stats but using manual steps. I am confused in finding the standard error. I follow the formula based on real statistics, but can not be used in manual steps. such as DIAG, DESIGN. the word is not found in the excel formula. thank you

20. Emily Stewart says:

Hi Charles,

Thanks so much for you work! It’s very helpful.

I was just wondering about the design function (DESIGN(E6:E15)) for the correlation matrix (which you addressed in a previous comment). I went to http://www.real-statistics.com/multiple-regression/least-squares-method-multiple-regression/, but was unable to find anything addressing it. The function doesn’t exist in Excel 2013. Is it an Add-In, because I couldn’t find it there either.

Is there a work around?

Thanks!

Emily

• Charles says:

Emily,
DESIGN is not a standard Excel function, but it is a function supplied in the Real Statistics addin. If you have installed the Real Statistics addin, you should be able to use the function just like any standard Excel function.
If X is a numeric array, then DESIGN(X) is simply the array X with a column of ones appended to it as the first column.
Charles