Multinomial Logistic Regression

We now extend the concepts from Logistic Regression, where we describe how to build and use binary logistic regression models, to cases where the dependent variable can have more than two outcomes. Using such models the value of the categorical dependent variable can be predicted from the values of the independent variables.

In this part of the website, we focus on the categorical case where there is no order to these outcomes (multinomial logistic regression). E.g. the categories might be Christian, Muslim, and Jewish.

In Ordinal Regression, we turn our attention to the case where there is order (ordinal logistic regression). E.g. the categories might be Child, Young Adult, Middle Aged, and Elderly.

Topics

59 thoughts on “Multinomial Logistic Regression”

  1. Good morning Dr Charles, I hope you are very well together with all your family.
    Please, how can the database be organized or reformated, to be able to perform a multinomial regression analysis? Normally, there is an electronic sheet with the individuals and downwards, with variable death (dies or not), drug doses, etc., but It’s organized here, and I don’t know how to put my base in that order.
    Thanks

    Reply
    • Gerardo,
      I am sorry, but I don’t understand your comment.
      Are you trying to perform multinomial logistic regression? If so, what sort of problems are you having?
      If you are trying to perform ordered logistic regression (aka ordinal regression), this support will be available later this week.
      Charles

      Reply
  2. Charles,

    I am attempting to model the win probability for each of the 3 outcomes in a given soccer match (home win %, away win %, and draw %) by using an ensemble approach with multiple models. To be specific, I have 4 separate models, and each of those models estimates the home win %, away win %, and draw win %.

    My question is that in this particular problem, do I have 4 independent variables (the 4 models), or do I actually have 12 (3 different win percentages from each of the 4 models)? If I wanted to do a regression to calculate coefficients for each of the models, would I need to use a multinomial logistic regression, since there are 3 possible outcomes? Is there a way to do this using the real statistics multinomial logistic regression tool?

    Appreciate your help.

    Reply
    • Mason,
      I presume that you want to use the regression model to predict something. What is it that you are trying to predict? Is it the % home wins, % away wins and % of draws. If so, your dependent variable takes these sorts of values. Since these can take a wide range of values, multinomial logistic regression doesn’t seem like a good fit. I assume that you have 12 independent variables, although these can be viewed as 4 3-tuples.
      Charles

      Reply
  3. Sir Charles,

    I need help in figuring out the best fitting statistical treatment for our research paper about evaluating the family planning program in the one of the cities in the Philippines. We have 1 dependent variable which is the contraceptive prevalence rate and 14 dependent variables that are divided into groups as they are indicators of a specific part of the program (management, training, new acceptors etc.) that is in ordinal (we used likert scale). We would like to test for the relationship of the independent variables to the dependent variable. Thank you!

    Reply
    • I assume that you mean that there are 14 independent variables.
      Are you saying that these 14 variables use a Likert scale?
      What about the one dependent variable? Does it use a Likert scale too or does it take numeric values?
      I don’t understand the relevancy of the groups of independent variables, Please explain.
      Charles

      Reply
  4. Sorry, but i need statistical help badly . I have one independent variable (Type : numeric) and I have 21 dependent variables also numeric type. I want to determine if the Independent variable have an effect on the dependent variables. How can i do that? What the correct test for that.
    Thanks
    i will appreciated i receive help

    Reply
    • Adel,
      In regression of the form y = a + bx, y is a dependent variable and x is an independent variable. Are you sure that you have one independent variable and 21 dependent variables? Perhaps you have one dependent variable (like y) and 21 independent variables (like x). If this is the case, then you can use multiple linear regression.
      Charles

      Reply
  5. Dr Charles, good morning, I hope you and your family are feeling very well in these difficult times. Please I have a database of 33 independent variables and a multinomial dependent variable, I must do a multinomial logistic regression, but I have the data without the format that you very kindly present on the page, and I try to organize them as you present them but I have not Possible, how can I apply multinomial logistic regression without the format that you present on the page, but with the 33 independent variables and the multinomial dependent variable? Or how can I build the format of my data so that it is in the format of the page of Real Stattistics ?, Thank you very much.

    Dr Charles, buenos días, espero que Ud y su familia se encuentre muy bien en estos dificiles tiempos. Por favor tengo una base de datos de 33 variables independintes y una variable dependiente multinomial, debo hacer una regresión logistica multinomial, pero tengo los datos sin el formato que ud muy amablemente presenta en la página, e intento organizarlos como Ud los presenta pero no he podido, ¿ cómo puedo aplicar la regresión logistica multinomial sin el formato que Ud presenta en la página, sino con las 33 variables independientes y la varaible dependiente multinomial? o como puedo construir el formato a mis datos para que quede en el formato de la página de Real Stattistics?, muchas gracias.

    Reply
  6. Thanks for the effort. Please a short demonstrative video would solve most of our problem and if there are, kindly provide a direction.

    Reply
  7. Dr. Good afternoon, I do not know if there is an error in the program, but to repeat the example of the multinomial logistic regression, the results of R-squre and level of significance, but what else I get with #VALOR.

    Dr. buenas tardes, no se si hay un error en el programa, pero la repetir el ejemplo de la regresión logística multinomial, los resultados de R-squre y de nivel de significancia, pero lo de mas me sale con #VALOR.

    Reply
        • Doc, what a pity, what happens is that in the example (sheet: Mlogit 4b), which is on the website when editing a formula appears:
          = MLogitParam (A4: E16; 2; 1; TRUE; TRUE; M3; 20)

          And when repeating the example, the same formula is:
          = MLogitParam (A4: E16; 1; 1; TRUE; ‘True’; M19; 20)

          I don’t know if it’s an error in the macro.

          Thank you

          Doc, que pena lo que pasa es que en el ejemplo (hoja: Mlogit 4b), que esta en la pagina web al editar una formula aparece:
          =MLogitParam(A4:E16;2;1;VERDADERO;VERDADERO;M3;20)

          Y al repetir el ejemplo, la misma fórmula queda:
          =MLogitParam(A4:E16;1;1;VERDADERO;’Verdadero’;M19;20)

          No se si sea una error en las macro.
          Gracias

          Reply
          • Hello Gerardo,
            I don’t know what would cause this to happen. I looked at the code and didn’t see any problem.
            Is it possible that you specified summary data rather than raw data (or vice versa)?
            Charles

          • Doc, good night, I took exactly the example, and I applied the Real Statistics tools, as they are in the example and it doesn’t generate the same results, it generates that error.

            Doc, buenas noches, yo tome exactamente el ejemplo, y apliqué las herramientas de Real Statistics, tal cual están en el ejemplo y no me genera los mismo resultados, me genera ese error.

          • Hello Gerardo,
            I can understand that the second TRUE could be ‘TRUE’ (or the equivalent in Spanish) instead of TRUE. This sort of issue doesn’t seem to occur in my Italian version of Excel, but it seems to arise in the Spanish version.
            I have no idea why the cell address for the alpha value would change. I looked at the software and can’t see any reason for this occurring.
            Charles

  8. Dear sir,

    If I have data consisting of only brand(B1,B2), Fuel type(F1,F2,F3) and performance, how should I run the interaction model in order to find p value?

    Reply
  9. Dear Professor Zaiontz,
    I encountered a major problem with the Binary Logistic Regression. Context: I have a file with 312 rows and 35 columns (all binary data) that represent certain business conditions and an outcome – employee engagement (engaged vs. not engaged). BUT it seems that I can’t run a proper BLR on that data now. (I was able to do so a 2 weeks ago. Using Newton’s method.)

    There are 2 specific issues: while using Newton’s method I get a diagonal line for the ROC curve and p-Pred at 0.5 for all observations, also Coeff = 0 for all observations. Second issue: after switching to Solver I get various p-Pred and Coeff’s but the Covariance Matrix returns a “#NUM!” error for which there seems to be no explanation. As you can imagine this stops the whole analysis half-way through. I checked the data formats and tried numbers, general, others – no change. I also changed the representation for decimal places – both commas and dots yield no improvement.

    I am writing this inquiry since I think the problem could be an issue with Excel / Real Stats versions. I am using Excel at Version 1702 (Build 7870.2020) and Real Stats at 4.13 Excel 2010/2013/2016.

    Also, a short additional question – what is your opinion on interpreting BLR ratios in such a case? The way I was working 2 weeks ago was re-running the analysis with 12 and then 8 variables to get a significant model for estimating engagement based on a limited amount of variables, adding an ability to derive a company-wide improvement strategy in 8 key areas instead of 35. Assuming I am focusing only on statistically significant ratios, but I wonder what impact there is on the validity of the data with so many variables. On the other hand all those variables are there (with dozens of others) they only thing that I can change is the amount o variables I collect and do math with.

    Thanks for all the great knowledge here and have a nice day,
    M.

    Reply
    • Mike,
      I have used binary logistic regression in the past few days on Excel 2013 and had no problems.
      If you send me an Excel file with your data and analysis I can check to see whether something I changed in the latest logistic regression release is causing the problem that you are seeing. You can find my email address at Contact us.
      Re BLR ratios, which ratios are you referring to?
      Charles

      Reply
      • Re BLR ratios, I was referring to the odds ratios (exp(b)). I would like to offer some deeper understanding to my presentations addressees.
        (But to give one you got to have one. 🙂 )
        So, I am wondering how can I relate this in more understandable terms. One way to go is to “translate” odds ratios to probability. BUT this helps only sightly. What I am actually after is a way to show the cumulative impact of manipulating several variables as a sum. I am operating with binary variables all the way so something is either done or not. How can I show what the outcome will be if we change some specific 8 variables? Is showing the difference (increse) in the p-Pred a way?
        My earlier question still stands – with so many variables, and only 312 observations – how seriously should I take the odds ratios? Is p-value enough to actually infer a relationship?

        Best,
        M.

        Reply
        • Mike,
          In looking at your data, I see that var20 and var26 have identical values, and so the algorithm won’t converge due to collinearity. If you remove var26, everything works fine.
          Charles

          Reply
  10. I am trying to use the binary logistic regression function. I added solver and the real statistics addin. When I select the logistic regression function, I get a runtime error 424. I repaired my microsoft office 2010 software and rebooted. Same errors.

    Every once in a while I get an error with solver. In any case I’m stuck.

    Any ideas?

    Reply
    • Kathleen,
      When you press Alt-TI do you see both RealStats and Solver on the list of addins with check marks next to them? If not you need to either add these addins or make sure that there are check marks next to them.
      When you first use Real Statistics, what do you see when you press the =VER() formula?
      Charles

      Reply
      • Charles: yes both addins are there and checked. The first time I placed the =ver() in the cell, 2007 showed up. Interesting because I am using 2010 Office. This time #NAME shows up.

        Kathleen

        Reply
        • Kathleen,
          That 2007 showed might mean that you have installed the wrong version of the software. I suggest that you reinstall the Real Statistics addin. I plan to issue a new release in a couple of days.
          Charles

          Reply
  11. Dear Charles,

    IF the Model fitting is not significant, should I proceed?
    If yes, what does it mean for the model fitting to be not significant while the parameter estimates
    is significant?

    Model Fitting Information
    Model Fitting Criteria
    -2 Log Likelihood
    95.673 90.756
    Likelihood Ratio Tests
    Model
    Intercept Only Final
    Chi-Square df
    4.917
    Sig.
    2
    .086
    IF the Model fitting is not significant, should I proceed?
    If yes, what does it mean for the model fitting to be not significant while the parameter estimates is significant?

    Reply
  12. Dear Sir,
    Please help me, I’m a newbie about this problem.
    Well, I’m now completing a research study about the relationship between narcissism (IV) and cyberbullying (DV) to instagram user. My independent variable has low-mid-high (interval data) and my dependant variable has a categorical data which consist of cyberbullying perpretator-cyberbullying victim-and the unidentified one.
    Yesterday, i tried a multinomial logistic regression analysis in SPSS, and it gave me a warning:

    “There are 1 (11,1%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies.
    Unexpected singularities in the Hessian matrix are encountered. This indicates that either some predictor variables should be excluded or some categories should be merged.
    The NOMREG procedure continues despite the above warning(s). Subsequent results shown are based on the last iteration. Validity of the model fit is uncertain.”

    What’s the warning means ? I don’t understand
    And is a multinomial logistic regression analysis that i’ve choosen right to be analysed in my research ?

    Sam
    Thankyou, Sir

    Reply
  13. Is it possible to use your resource pack for conditional logisitic regression? Think of analyzing which horse will win a given horse race relative to the other horses….Thanks!

    Reply
  14. Sir
    Please help me with this notification i am very new to real statistic package while i am trying to perform multinomial logistic regression its saying “last column of input range must contain all the values 0,1,2,…, and only these values where r=max value in the last column of input range (r must be <25). How can i solve this problem ?

    Reply
      • Hi Charles,

        I am facing a similar problem. I am trying to fit a logistic regression model whereby I can predict the attrition probability of an employe. I have other independent variables like tenure, performance, etc.

        I am a bit confused on how to use the tool.

        First, I was facing the same problem as Ashik. However, I moved the attrition column (0 – not attrited, 1 – attrited) to the end which removed the error.

        Now the output is not making sense to me. I think if you could include some steps or instructions on how to use the workbook or tool could be helpful within the workbook itself.

        Thanks.

        BTW your website is a great resource.

        Reply
        • Amar,
          I’ll look into adding some additional information. In the meantime, if you send me an Excel file with your data, I will explain what you need to do. You can find my email address at Contact Us.
          Charles

          Reply
  15. Hi Dear Dr. Zaiontz,
    Im am completing a research study looking to see if there is an association between rates of hypotension (yes/no) during surgery (primary outcome) and use of a certain blood pressure medication (given /held prior to surgery). I have multiple regressors / confounding variables that I am trying to account for. Some are binary in nature (0,1) and some are continuous (ex. blood pressure readings). Someone had suggested I split my regression analysis: 1) do a multi nominal analysis for comparing my independent variable and nominal data, 2) do a multivariate linear regression for comparing independant variable with continuous regressors. What is your opinion on the above advise? What type of test do you feel would be most appropriate?
    Thanks,
    Sarah

    Reply
    • Sarah,
      These approaches could be useful, but I would need to have a more complete picture of the situation before I could definitively answer your question.
      Charles

      Reply
  16. My Independent variables are gender and academic achievements in term of CGPA. While my DV is Emotional intelligence EI. What type of tests i will do to prove that gender has relationship with EI, and Academic achievements predict EI.

    Reply
  17. Dear Dr. Zaiontz,

    I am planning on using Conjoint Analysis to measure preference for new products. As you know, it uses a multinomial logit model. However, I have found special softwares to conduct such analysis but they are very expensive. Do you know if Conjoint Analysis could be performed using Excel, or are there other ways of doing it? (I have been told that I could find free codes to use it on R, but I got lost when I saw those). Any help is greatly appreciated.
    Sincerely,
    Hamad

    Reply
  18. Hi Prof. Zaiontz
    I appreciate if you kindly help me in doing multinomial logistic regression between my categorical phenotypic data (as dependent variables) and genotypic data (both binary and allelic states as independent variables).
    FYI, I am analysing my data in a panel of 143 barley genotypes for association mapping in barley. I have used GLM and MLM models for my quantitative and ordinal phenotypic data in TASSEL software(http://www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119).
    regards,
    Hossein

    Reply
  19. Dear sir,
    Can u tell me, when we have Categorical variable for both dependent & Independent variables, How we will do the regression analysis

    Reply

Leave a Reply to Charles Cancel reply