In this section we extend the concepts from Logistic Regression where we describe how to build and use **binary logistic regression** models to cases where the dependent variable can have more than two outcomes. Using such models the value of the categorical dependent variable can be predicted from the values of the independent variables.

We first address the categorical case where there is no order to these outcomes (**multinomial logistic regression**). We then turn our attention the situation where there is order (**ordinal logistic regression**).

Topics:

- Basic concepts of multinomial logistic regression
- Finding multinomial logistic regression coefficients
- Real Statistics capabilities
- Ordinal logistic regression

Dear sir,

Can u tell me, when we have Categorical variable for both dependent & Independent variables, How we will do the regression analysis

Iresha,

When all the variables are categorical you can use log-linear regression. See the webpage http://www.real-statistics.com/log-linear-regression/ for more details.

Charles

Hi Prof. Zaiontz

I appreciate if you kindly help me in doing multinomial logistic regression between my categorical phenotypic data (as dependent variables) and genotypic data (both binary and allelic states as independent variables).

FYI, I am analysing my data in a panel of 143 barley genotypes for association mapping in barley. I have used GLM and MLM models for my quantitative and ordinal phenotypic data in TASSEL software(http://www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119).

regards,

Hossein

Hossein,

What sort of help are you looking for?

Charles

Hi

Please let me know how I can send you a sample of my data to you. Once you kindly look at it, I can say more details on what I mean and looking for.

Hossein

Hi Hossein,

See Contact Us.

Charles

Hi Charles

I sent an example to you. Did you have time to have a look at?

Hossein

Hossein,

I have received your email, but have not had time to look at it yet.

Charles

Dear Dr. Zaiontz,

I am planning on using Conjoint Analysis to measure preference for new products. As you know, it uses a multinomial logit model. However, I have found special softwares to conduct such analysis but they are very expensive. Do you know if Conjoint Analysis could be performed using Excel, or are there other ways of doing it? (I have been told that I could find free codes to use it on R, but I got lost when I saw those). Any help is greatly appreciated.

Sincerely,

Hamad

Dear Hamed,

The Real Statistics Resource Pack doesn’t yet support conjoint analysis, but I have found the following website which seems to have an example Excel worksheet which may be useful to you.

http://www.dobney.com/Conjoint/conjoint_simple.htm

Charles

My Independent variables are gender and academic achievements in term of CGPA. While my DV is Emotional intelligence EI. What type of tests i will do to prove that gender has relationship with EI, and Academic achievements predict EI.

One option is to use multiple linear regression.

Charles

Hi Dear Dr. Zaiontz,

Im am completing a research study looking to see if there is an association between rates of hypotension (yes/no) during surgery (primary outcome) and use of a certain blood pressure medication (given /held prior to surgery). I have multiple regressors / confounding variables that I am trying to account for. Some are binary in nature (0,1) and some are continuous (ex. blood pressure readings). Someone had suggested I split my regression analysis: 1) do a multi nominal analysis for comparing my independent variable and nominal data, 2) do a multivariate linear regression for comparing independant variable with continuous regressors. What is your opinion on the above advise? What type of test do you feel would be most appropriate?

Thanks,

Sarah

Sarah,

These approaches could be useful, but I would need to have a more complete picture of the situation before I could definitively answer your question.

Charles

Sir

Please help me with this notification i am very new to real statistic package while i am trying to perform multinomial logistic regression its saying “last column of input range must contain all the values 0,1,2,…, and only these values where r=max value in the last column of input range (r must be <25). How can i solve this problem ?

Ashik,

If you send me an Excel file with your data, I will explain what you need to do. You can find my email address at Contact Us.

Charles

Hi Charles,

I am facing a similar problem. I am trying to fit a logistic regression model whereby I can predict the attrition probability of an employe. I have other independent variables like tenure, performance, etc.

I am a bit confused on how to use the tool.

First, I was facing the same problem as Ashik. However, I moved the attrition column (0 – not attrited, 1 – attrited) to the end which removed the error.

Now the output is not making sense to me. I think if you could include some steps or instructions on how to use the workbook or tool could be helpful within the workbook itself.

Thanks.

BTW your website is a great resource.

Amar,

I’ll look into adding some additional information. In the meantime, if you send me an Excel file with your data, I will explain what you need to do. You can find my email address at Contact Us.

Charles

Is it possible to use your resource pack for conditional logisitic regression? Think of analyzing which horse will win a given horse race relative to the other horses….Thanks!

Dennis,

I’ll need to look into this and possibly add it to a future release.

Charles

I would love that feature too!

Dear Sir,

Please help me, I’m a newbie about this problem.

Well, I’m now completing a research study about the relationship between narcissism (IV) and cyberbullying (DV) to instagram user. My independent variable has low-mid-high (interval data) and my dependant variable has a categorical data which consist of cyberbullying perpretator-cyberbullying victim-and the unidentified one.

Yesterday, i tried a multinomial logistic regression analysis in SPSS, and it gave me a warning:

“There are 1 (11,1%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies.

Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged.

The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.”

What’s the warning means ? I don’t understand

And is a multinomial logistic regression analysis that i’ve choosen right to be analysed in my research ?

Sam

Thankyou, Sir

Sam,

From your description, multinomial logistic regression analysis seems to be a good choice, except for the warning. You should pay attention to warning “There are 1 (11,1%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies.” You can ignore the rest of the warning.

I don’t use SPSS and so I can’t comment further about the warning message, but I suspect that your sample is very small with not enough data to find a fit for the logistic regression model.

Charles

Dear Charles,

IF the Model fitting is not significant, should I proceed?

If yes, what does it mean for the model fitting to be not significant while the parameter estimates

is significant?

￼Model Fitting Information

￼￼￼￼￼￼￼￼￼￼￼￼￼￼Model Fitting Criteria

-2 Log Likelihood

95.673 90.756

Likelihood Ratio Tests

￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼Model

Intercept Only Final

Chi-Square df

4.917

Sig.

￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼￼2

.086

￼￼￼￼￼￼￼￼￼￼￼IF the Model fitting is not significant, should I proceed?

If yes, what does it mean for the model fitting to be not significant while the parameter estimates is significant?

If the model is not significant, then there is no point in proceeding. You might as well use the L0 baseline model.

Charles

what is the L0 baseline model?

Fatimah,

It is the model without any independent variables.

Charles

I am trying to use the binary logistic regression function. I added solver and the real statistics addin. When I select the logistic regression function, I get a runtime error 424. I repaired my microsoft office 2010 software and rebooted. Same errors.

Every once in a while I get an error with solver. In any case I’m stuck.

Any ideas?

Kathleen,

When you press Alt-TI do you see both RealStats and Solver on the list of addins with check marks next to them? If not you need to either add these addins or make sure that there are check marks next to them.

When you first use Real Statistics, what do you see when you press the =VER() formula?

Charles

Charles: yes both addins are there and checked. The first time I placed the =ver() in the cell, 2007 showed up. Interesting because I am using 2010 Office. This time #NAME shows up.

Kathleen

Kathleen,

That 2007 showed might mean that you have installed the wrong version of the software. I suggest that you reinstall the Real Statistics addin. I plan to issue a new release in a couple of days.

Charles

Dear Professor Zaiontz,

I encountered a major problem with the Binary Logistic Regression. Context: I have a file with 312 rows and 35 columns (all binary data) that represent certain business conditions and an outcome – employee engagement (engaged vs. not engaged). BUT it seems that I can’t run a proper BLR on that data now. (I was able to do so a 2 weeks ago. Using Newton’s method.)

There are 2 specific issues: while using Newton’s method I get a diagonal line for the ROC curve and p-Pred at 0.5 for all observations, also Coeff = 0 for all observations. Second issue: after switching to Solver I get various p-Pred and Coeff’s but the Covariance Matrix returns a “#NUM!” error for which there seems to be no explanation. As you can imagine this stops the whole analysis half-way through. I checked the data formats and tried numbers, general, others – no change. I also changed the representation for decimal places – both commas and dots yield no improvement.

I am writing this inquiry since I think the problem could be an issue with Excel / Real Stats versions. I am using Excel at Version 1702 (Build 7870.2020) and Real Stats at 4.13 Excel 2010/2013/2016.

Also, a short additional question – what is your opinion on interpreting BLR ratios in such a case? The way I was working 2 weeks ago was re-running the analysis with 12 and then 8 variables to get a significant model for estimating engagement based on a limited amount of variables, adding an ability to derive a company-wide improvement strategy in 8 key areas instead of 35. Assuming I am focusing only on statistically significant ratios, but I wonder what impact there is on the validity of the data with so many variables. On the other hand all those variables are there (with dozens of others) they only thing that I can change is the amount o variables I collect and do math with.

Thanks for all the great knowledge here and have a nice day,

M.

Mike,

I have used binary logistic regression in the past few days on Excel 2013 and had no problems.

If you send me an Excel file with your data and analysis I can check to see whether something I changed in the latest logistic regression release is causing the problem that you are seeing. You can find my email address at Contact us.

Re BLR ratios, which ratios are you referring to?

Charles

Re BLR ratios, I was referring to the odds ratios (exp(b)). I would like to offer some deeper understanding to my presentations addressees.

(But to give one you got to have one. 🙂 )

So, I am wondering how can I relate this in more understandable terms. One way to go is to “translate” odds ratios to probability. BUT this helps only sightly. What I am actually after is a way to show the cumulative impact of manipulating several variables as a sum. I am operating with binary variables all the way so something is either done or not. How can I show what the outcome will be if we change some specific 8 variables? Is showing the difference (increse) in the p-Pred a way?

My earlier question still stands – with so many variables, and only 312 observations – how seriously should I take the odds ratios? Is p-value enough to actually infer a relationship?

Best,

M.

Mike,

In looking at your data, I see that var20 and var26 have identical values, and so the algorithm won’t converge due to collinearity. If you remove var26, everything works fine.

Charles