**Definition 1**: For any coefficient *b* the **Wald** statistic is given by the formula

**Observation: **For ordinary regression we can calculate a statistic *t* ~ *T*(*df _{Res}*) which can be used to test the hypothesis that a coordinate

*b*= 0. The Wald statistic is approximately normal and so it can be used to test whether the coefficient

*b*= 0 in logistic regression.

Since the Wald statistic is approximately normal, by Theorem 1 of Chi-Square Distribution, *Wald*^{2} is approximately chi-square, and, in fact, *Wald*^{2} ~ χ^{2}(*df*) where *df = k – k*_{0} and *k* = the number of parameters (i.e. the number of coefficients) in the model (the full model) and *k*_{0} = the number of parameters in a reduced model (esp. the baseline model which doesn’t use any of the variables, only the intercept).

**Property 1**: The covariance matrix *S* for the coefficient matrix *B* is given by the matrix formula

where *X* is the *r* × (*k*+1) design matrix (as described in Definition 3 of Least Squares Method for Multiple Regression)

and *V* = [*v _{ij}*] is the

*r × r*diagonal matrix whose diagonal elements are

*v*(1–

_{ii}= n_{i}p_{i}*p*), where

_{i}*n*= the number of observations in group

_{i}*i*and

*p*= the probability of success predicted by the model for elements in group

_{i}*i.*Groups correspond to the rows of matrix

*X*and consist of the various combinations of values of the independent variables.

Note that *S* = (*X*^{T}*W*)^{-1} where *W* is* **X* with each element in the *i*th row of *X* multiplied by *v _{ii}*.

**Observation**: The standard errors of the logistic regression coefficients consist of the square root of the entries on the diagonal of the covariance matrix in Property 1.

**Example 1** (Coefficients): We now turn our attention to the coefficient table given in range E18:L20 of Figure 6 of Finding Logistic Regression Coefficients using Solver (repeated in Figure 1 below).

**Figure 1 – Output from Logistic Regression tool**

Using Property 1 we calculate the correlation matrix *S* (range V6:W7) for the coefficient matrix *B *via the the formula

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)), MMULT(DIAGONAL(J6:J15*(1-J6:J15)*H6:H15),DESIGN(E6:E15))))

Actually, for computational reasons it is better to use the following equivalent array formula:

=MINVERSE(MMULT(TRANSPOSE(DESIGN(E6:E15)),J6:J15* (1-J6:J15)*H6:H15*DESIGN(E6:E15)))

The formulas used to calculate the values for the Rems coefficient (row 20) are given in Figure 2.

**Figure 2 – Formulas for Logistic Regression coefficients**

Note that Wald represents the *Wald*^{2} statistic and that lower and upper represent the 100-*α*/2 % confidence interval of exp(*b*). Since 1 = exp(0) is not in the confidence interval (.991743, .993871), the Rem coefficient *b* is significantly different from 0 and should therefore be retained in the model.

**Observation**: The **% Correct** statistic (cell N16 of Figure 1) is another way to gauge the fit of the model to the observed data. The statistic says that 76.8% of the observed cases are predicted accurately by the model. This statistic is calculated as follows:

For any observed values of the independent variables, when the predicted value of *p* is greater than or equal to .5 (viewed as predicting success) then the % correct is equal to the value of the observed number of successes divided by the total number of observations (for those values of the independent variables). When *p* < .5 (viewed as predicting failure) then the % correct is equal to the value of the observed number of successes divided by the total number of observations. These values are weighted by the number of observations of that type and then summed to provide the % correct statistic for all the data.

For example, for the case where Rem = 450, p-Pred = .774 (cell J10), which predicts success (i.e. survived). Thus the % Correct for Rem = 450 is 85/108 = 78.7% (cell N10). The weighted sum (found in cell N16) of all these cells is then calculated by the formula =SUMPRODUCT(N6:N15,H6:H15)/H16.

Hi Charles,

I have use Multinomial Logistic Regression data analysis tool. The exp(b) of coeff int the report is > lower and <upper, but the p-value of coeff is 0.8455, so this coeff retained in the model or not?

Thanks

Anson

If the p-value is .8455, then the coeff is not significantly different from zero.

Charles

Charles,

Thanks for your quick reply.

I have 50 columns of Independent Variables and each column have about 30000 data, only one column for the dependent variables.

So how to calculate each the coeff is significantly different from zero and should therefore be retained in the model?

Anson

Anson,

If p-value < alpha then the coefficient is significantly different from zero. Equivalently, if 1 is not in the confidence interval then the coefficient is significantly different from zero. Charles

Charles,

Thanks for reply.

I have a question for using

MLogitPred(R0, R1, r, iter) function.

I have 20 col x 1001 row raw data with heading. Column A: ref no.

Columns B to E: independent variables

Columns F to T : dependent variables

I have change the value “r” to predict and the result as below:

E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,13), Result=0.04

E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,0), Result=#VALUE

E2:S2=MLogitPred(A2:D2,ChangeData!$B$2:$T$1001,4), Result=#VALUE

So is that mean I use 13 in the r is right?

Anson

Anson,

You saw your data is in raw format, but you also say that columns F to T contain the dependent variables. Both of these can’t be true. If the data is in raw format then there would only be one column for the dependent variables. I will therefor assume that your data is in summary format. Since columns F to T contain the dependent variables and there are 15 columns from F to T, this would mean that your dependent variables are numbered 0, 1, 2, …, 14. Thus r should be 14.

Charles

Hi,

Thanks for creating this great tool and website.

I ran a logistic regression using the tool on on Mac using the data set as described in the video (https://www.youtube.com/watch?v=EKRjDurXau0) . But I am not getting the p-value table (as can be seen in the screenshot in the webpage above) for all the coefficients to determine the significance of each independent variable. What can be done ?

Shashank,

If you send me an Excel file with your data and results, I will try to figure out what is going on. You can find my email address at Contact Us.

Charles

Hi Charles,

Appreciate the quick response. My problem got solved.

I had large no. of rows so I was not able to locate the table.

thanks

Regards

Shashank

Hi Charles,

I’m still having trouble understanding the meaning of the p value and statistical significance in logistic regression. I want to know how significant are the coefficients. What does it exactly mean that it is statistically significant? From my basic understanding if the p value is below the cutoff point, i.e<0.05 then the variable is statistically significant right? and also, Since there isn't a normal distribution in logistic regression how reliable is the p value?

cheers,

Matt

Matt,

Yes, if the p value is below the cutoff point alpha (e.g. alpha = 0.05) then the variable is statistically significant. This means the (population) coefficient for that variable can be considered to be non-zero (i.e. that variable has a significant impact on the model). Although “there isn’t a normal distribution in logistic regression”, the distribution of the coefficients is normal.

Charles

Hello

A question about the Wald test. You can determine the Wald test for linear regression? The information I find is used for logistic regression. I want to rule out if I can use it for a linear regression. Gtacias

Is there any particular reason why you want to use the Wald test for linear regression?

Charles

How can i reduce the p value of my intercept coefficient if the p value of all my other variables are satisfactorily low in logistic regtession .

Sankit,

Besides changing your data (e.g. via a transformation), I don’t know how to do this. I also don’t know why you would want to do this.

Charles

Dear Charles,

Alas I am a babe in the logit world and I hope you will be kind enough to point me in the right direction. I have access to a large dataset on student scores that have been previously standardised along the lines of mean 25, s.d. 5. I wish to perform some significance testing between certain groups of students and have struck on the idea that I could/should convert these scores to logit’s using the probability of achieving each ‘raw’ score – essentially treating them as z-scores – and then use these logits in place of the raw standardised scores and conduct z-tests. I am (if it isn’t already painfully obvious) too statistically underskilled to know whether I am committing an egregious blunder with such a plan, but the reference to Wald in your article makes me think that I probably am… Any suggestions on how I might better understand the issue would be very welcome.

Ead,

It is not clear to me what advantage (if any) you get by converting the scores to logit’s. Why can’t you simply use the raw scores?

Charles

Dear Charles

Thank you for your help. I have to run the variables temperature treatments on three groups of 10 plants. i would like to use Anova one-way for variance analysis.

I would like to know if it is the right analysis when i use Anova repeated measures

Sorry, but you need to provide more information before I am able to answer your question.

Charles

Hi Charles,

Fantastic website. Thank you, it’s been very helpful.

I have a binomial logistic regression with 10 independent variables. HL test and R2 indicate the model is a poor fit. I’d like to try to improve the fit by removing variables that have low Wald scores and add in variable interactions. I saw the commentary on creating interactions for multi variable linear regression, but I am not sure if I can copy exactly, or if I need to make further adjustments due to logistic function.

1.) Is there an easy way to add in interactions?

2.) Is there a single best test to use to decide if the model is a good fit? — high AUC, use one of the R2’s, or the HL test?

Thanks again!

Marty

Marty,

1. You can add in interaction of independent variables in exactly the same way as you do for multiple linear regression. I have simply implemented this via x1 * x2, which is easy to do in Excel.

2. There isn’t a simple answer to this question, although I wouldn’t rely too heavily on the HL value. High AUC and R2 are likely to be better indicators.

Charles

Thanks Charles!

One more question, is the de facto R2 “floor” of a binomial logistic regression .50?

If I didn’t use a model and just “guessed”, it seems like I’d have a 50/50 chance of predicting the actual outcome.

Therefore, if my model yields an R2 of .56, does that mean that the model only offers an .06 improvement of what I would have been able to achieve using guesswork alone?

Thanks again,

Marty

Marty,

Your remarks are true for the % Correct statistic, but not for the R2 statistic.

R2 is calculated in a completely different way, and your remarks are not true for R2.

Charles

Hello Charles

Thank you very much for the answer.

I think I can understand a bit better how you did the covariance matrix.

However, all my independent variables are continuous (no repetitions), so that, could be possible to form the groups in function of the probability (as for Hosmer lemeshow statistics)?.

That means to group the cases that are predicted with probability <0.1, <0.2…..<1

Thaks in advance

No problem. Each summary data row will be equivalent to one raw data row. The reported Hosmer Lemeshow value won’t be quite since it is typically based on 10 summarized observations, but that is probably not so important.

Charles

Hello Charles

Could you define what is group i in “property 1”?

I would love to know which parameters did you choose to build the covariance matrix.

In my logistic regression model I only have 2 variables so I will do the covariance matrix by using covar functions.

Thanks

In this context each group consists of any combinations of values of the independent variables. So if you have independent variables Gender and Age and the raw data is

M 30 1

F 31 0

M 30 0

M 32 1

F 32 0

F 31 1

F 30 1

M 32 1

There are 8 sample elements (rows), but some of them can be grouped together, namely the ones where the gender and age are the same. This yields the following summary data (a sort of frequency table). The summary contains 5 groups The 3rd column is a count of all the cases that have a 0 as the dependent variable and the 4th column is a count of all the cases that have a 1 as the dependent variable.

M 30 1 1

F 31 1 1

M 32 0 2

F 32 1 0

F 30 0 1

This is a silly example, but I hope it helps answer your question.

Charles

Dear sir,

I have done logistic regression for 20 independent variables for which all of them are categorical (0 and 1) also 1 binomial response variable. However, the significant test using p-value do not seems right with the variables.

Does using all categorical variable as independent variable effects the result?

Dear Pradash,

The usual logistic regression model doesn’t seem like the correct approach. From what I can tell you need something like a 20-dimension contingency table using log-linear model (see http://www.real-statistics.com/log-linear-regression/). I am not sure how to handle such a problem. Perhaps someone else can make a suggestion.

Charles

Dear sir,

What is the significance of using value 1 at the 1st column of matrix X?

Ones in the first column of the design matrix X is the way of handling the constant terms.

Charles

If I want to use it for any data. I must put the value 1 is it ?

Yes you need to include the 1’s.

Charles

Charles,

Thanks so much for your website. It is the only place that remotely comes close to explaining how exactly to calculate the standard error of regression coefficients. I do have a couple of simple questions:

“…and V = [vi] is the r × r matrix where vi = ni pi (1-pi).”

Should this read “is the r x 1 matrix”? If not, how is each i,j computed?

Also, could you explain what the ni term is? Thanks in advance.

Best Regards,

Kris Pickrell

Hi Kril,

Thanks for catching some sloppy notation on my part. The correct expression is that “V = [vij] is the r × r diagonal matrix whose diagonal elements are vii = ni pi (1–pi).” I have updated the webpage to reflect this.

What I wrote would be correct with V = [vi] as an r × 1 matrix with vi = ni pi (1–pi), but with VX in the expression S = (XTVX)-1 being scalar multiplication rather than matrix multiplication.

ni = the number of observations for group i, where group i corresponds to the ith row of matrix X and consist of one of the various combinations of values of the independent variables (actually the ith such combination).

Charles

Thanks!

Hi Charles,

Thanks for the info. I was able to work it out (I haven’t messed around with matrices since I was an undergrad engineering major in the 80’s). I had another quick question regarding the creation of the covariance matrix:

The Design matrix (X) and the Diagonal Variance matrix (V) are created in your example with all of the data records sorted according to P(X) (p-pred) being in descending order.

It appears to be necessary to sort the data records according to P(X) (p – pred) in descending order before creating the X and V matrices.

If data records are not sorted according to P(X) in descending order at the beginning of the calculations, the resulting X and V matrices will produce a very different (and apparently incorrect) covariance matrix.

I was just wondering if you would agree that data records must be sorted according to P(X) in descending order at the start of these calculations in order to obtain the correct covariance matrix?

Hi Mark,

I don’t believe that the order matters. I scrambled the order of the summary table values for both Example 1 and 2 on webpage http://www.real-statistics.com/logistic-regression/finding-logistic-regression-coefficients-using-newtons-method/ so that p-pred was not in sorted order and got the same coefficients, max LL and covariance matrices values as those displayed on the webpage.

Charles

Hi Charles,

Thanks for the quick reply and I really enjoy your fantastic site. I did have a question regarding your answer to my original question. Your answer above mentioned the following function DESIGN(E6:E15). I wasn’t able to find any documentation about that function. I was wondering if you might have any more information on it or a workaround (it doesn’t appear to be a function in Excel 2010)? Thanks in advance.

Mark,

The design matrix is a standard statistical concept and is defined on the webpage http://www.real-statistics.com/multiple-regression/least-squares-method-multiple-regression/. The supplemental DESIGN function is also described on that page.

Charles

Hi Charles,

Excellent work! I was wondering if you wouldn’t mind providing a bit more clarity to the calculation of the standard errors of the logistic regression coefficients. Any chance you could show the actually matrix work that had to be done? You’ve listed the basic formulas but it’s not clear (to me anyway). I’ve looked everywhere on the Internet and there is no specific documentation on how to construct the covariance matrix of the logistic regression coefficients. You did it with a supplemental function you created. I am hoping to get the s.e. of those coefficients so I can manually calculate the Wald statistic for each coefficient. Unbelievably, there is zero documentation on the Internet on how to do that. A huge thanks for any help in advance.

Mark,

Thanks for your comment. The standard errors are the square roots of the values on the main diagonal of the covariance matrix. In the next day or two I will update the website with a better description of how to calculate the covariance matrix.

Charles

Update 20 Aug 2013: The site has now been updated with the Excel formula I used to calculate the covariance matrix of B. Charles