**Multinomial Logistic Regression Functions**

**Real Statistics Functions**: The following are array functions where R1 is a range which contains data in either raw or summary form (without headings).

**MLogitCoeff**(R1, *r, lab, head, iter*) – calculates the multinomial logistic regression coefficients for data in range R1. If *head* = TRUE then R1 contains column headings.

**MLogitParam**(R1, *r, h, lab, head, alpha, iter*) – calculates the multinomial logistic regression coefficients based on the data in R1 for one value *h* of the dependent variable (default: *h* = 1). If *head* = TRUE then R1 contains column headings. Includes the standard errors, Wald statistic, p-value and 1 – *α* confidence interval.

**MLogitTest**(R1, *r, lab, iter*) – calculates *LL* of the full and reduced models, the chi-square statistic and the p-value for the data in range R1 (without headings)

**MLogitRSquare**(R1, *r, lab, iter*) – calculates *LL* of the full and reduced models for the data in range R1 (without headings) and three versions of *R*^{2} (McFadden, Cox and Snell, Nagelkerke) as well as *AIC* and *BIC*

**MLogit_Accuracy**(R1, *r, lab, head, iter*): returns a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. Thus, if R1 contains *k* independent variables, then the output is a *k*+1 × 1 column array (or a *k*+1 × 2 array if *lab* = TRUE).

Here the parameters *lab*, *head*, *r,* *alpha* and *iter* are optional.

When *r* = 0 (default) then the data is in raw form, whereas if *r* ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, *r*.

When *lab* = TRUE then the output includes row and/or column headings and when *lab* = FALSE (the default) only the data is outputted.

The parameter *alpha* is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter *iter* determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of *head* is FALSE.

The Real Statistics Resource Pack also provides the following array functions:

**MLogitPred**(R0, R1, *r*, *iter*) – outputs a 1 × *rr* row vector which lists the probabilities of outcomes 0, 1, …, *rr* (in that order) for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression model calculated from the data in R1 (without headings). If *r* = 0 (raw data) then *rr* = the maximum value in the last column of R1. If *r* ≠ 0 then *rr = r*.

**MLogitPredC**(R0, R2) – outputs a 1 × *r* row vector which lists the probabilities of outcomes 0, 1, …, *r* (in that order), where *r* = 1 + the number of columns in R2, for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression coefficients contained in R2. Note that if R0 is a 1 × *k* row vector or *k* × 1 column vector, then R2 is a (*k* +1) × (*r* – 1) range.

**MLogitSummary**(R1, *head*) – takes the raw data in range R1 and outputs an equivalent array in summary form. If *head* = TRUE then R1 contains column headings as well as the output.

**MLogitSelect**(R1, *s, head*) – array function which takes the summary data in range R1 and outputs an array in summary form based on *s*. If *head* = TRUE then R1 includes column headings as well as the output. The string *s* is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if *s* = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output.

In addition, there is the **MLogitExtract** function which is described in Finding Multinomial Logistic Regression Coefficients.

**Observation**: Figure 1 shows the use of some of the supplemental functions described above for Example 1 of Finding Multinomial Logistic Regression Coefficients (where the model data is in summary form). The output should agree with the output obtained from the Newton’s Method model shown in Figure 3, 4 and 5 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method.

**Figure 1 – Multinomial Logistic Regression functions **

Some key formulas in Figure 1 are shown in Figure 2.

**Figure 2 – Key formulas from Figure 1**

**Observation** The *AIC* (Akaike’s Information Criterion) and *BIC* (Baysian Information Criterion) statistics which are displayed as part of the MLogitRSquare function are calculated by the following formulas.

*AIC* = -2*LL* + 2(*k*+1)*r* * BIC* = -2*LL* + (*k+*1)*r* ln(*N*)

where *N* = the total number of observations. The use of these statistics is as described for binary logistic regression models in Real Statistics Functions for Logistic Regression.

**Observation**: Figure 3 shows the use of some of the supplemental functions described above for a multinomial extension to Example 2 of Finding Logistic Regression Coefficients using Newton’s Method (where the model data is in raw form). Here, the outcome 0 = female, 1 = male and 2 = hermaphrodite.

**Figure 3 – Multinomial Logistic Regression functions with raw input data**

Here range E5:I10 is calculated by =MLogitSummary(A5:C53), the range E14:E18 is calculated by =MLogitTest(A5:C53,0,TRUE) and the range H14:I18 is calculated by =MLogitTest(E5:I10,2,TRUE).

**Model Accuracy**

**Example 2**: Calculate the accuracy of the multinomial logistic regression model for Example 1 of Finding Multinomial Logistic Regression Coefficients (the data is duplicated in range A5:E17 of Figure 4).

We first show how to do the calculations manually in Figure 4.

**Figure 4 – Multinomial regression model accuracy**

Range F6:H17 shows the probabilities predicted by the model for each data outcome. This is the output from the array formula =MLogitPred(A6:B17,$A$6:$E$17,2). We see, for example, that the highest probability for Dosage 20 and Gender 0 is Dead (.739403 in cell F6) and so 13 of the samples are predicted correctly and the other 0+8 = 8 are predicted incorrectly. The number of samples predicted correctly when the model predicts Dead is shown in column I, with columns J and K showing the number of samples predicted correctly when the model predicts Cured or Sick, respectively.

For example, cell I6 (for Dead) contains the formula =IF(F6>=MAX($F6:$H6),C6,””). Similarly, cell J6 (for Cured) contains the formula =IF(G6>=MAX($F6:$H6),D6,””) and cell K6 (Sick) contains the formula =IF(G6>=MAX($F6:$H6),D6,””). Cell L6 contains the total samples for row 6 predicted correctly by the model, namely 13, using the formula =SUM(I6:K6).

If we highlight the range I6:L17 and press **Ctrl-D**, we get all the correctly predicted sample values. Summing up each column, we get the values in I18:L18. Dividing these values by the values in range C18:E18, we get the percentage correct shown in range I19;L19.

In particular, we see that the model only predicts 55% of sample elements correctly.

We can obtain the same result using the array formula

=MLogit_Accuracy(A5:E17,2,TRUE,TRUE), as shown in Figure 5.

**Figure 5 – Model accuracy**

**Data Analysis Tool**

**Real Statistics Data Analysis Tool**: The Real Statistics Resource Pack supplies a **Multinomial Logistic Regression** data analysis tool that automates many of the capabilities described above.

For example, to perform the analysis for Example 1 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method, press **Ctrl-m** and double click on the **Regression** option in the dialog box that appears. Next click on the **Multinomial Logistic Regression** option in the dialog box that appears and click on the **OK** button. This will bring up the dialog box shown in Figure 6.

**Figure 6 – Multinomial Logistic Regression dialog box**

Fill in the fields as shown in Figure 6. Note that columns A and B contain the data for the independent variables, and so you enter the number 2 in the **# of Independent Variables** field. When you press the **OK** button, the output displayed in Figure 7 will appear.

**Figure 7 – Multinomial Logistic Regression output for summary input data**

To perform the analysis for Example 1, follow the steps described above. When the dialog box shown in Figure 6 appears, insert the range A4:C53 (from Figure 5) in the **Input Range** field.

Since the input range has 3 columns and the **# of Independent Variables** is 2, this leaves only one column for the dependent variables. The software knows that this means that the input data was formatted in raw data format.

The output will appear as shown in Figure 8.

**Figure 8 – Multinomial Logistic Regression output for raw input data**

Note that the output contains the summary data shown in range E6:I4, as well as output based on this summary data that is formatted as in Figure 7.

Hi Dr. Zaiontz

Is there a row/column limit for the multinomial logistic regression function? I can’t seem to make it work with my 370 x 7 dataset.=(

Regards,

Astrid

Astrid,

The limit is much bigger than 370 rows and 7 columns. The limit is a little more than 65,000 cells, and even then I show ways of exceeding this limit.

When you say that it doesn’t work for a 370 x 7 data set, do you mean that you get error cells. The likely reason for this is that the logistic regression model doesn’t converge to a solution, which is an indication that the model is not a good fit for the data.

Charles

Hello Dr. Zaiontz,

Yes I get #VALUE! error. I was trying to create a classification model using multinomial logistic regression, and since I am not able to make it work, I ended up running multiple logistic regressions per class. It worked and it gave me an ok score at kaggle. I’m new to this predictive modeling thingy and may I ask if a multinomial logistic regression would yield different results as doing multiple logistic regressions per class?

regards,

astrid

Astrid,

The results may be different, although maybe only marginally so. This is shown on the website, when I show how to use multiple binary logistic regressions to generate a multinomial logistic regression model.

Charles

Dear Charles,

In the output for a multinomial regression analyses using 10 independent (categorical, and numberical) and 1 dependent variable (categorical, 0,1) the table with the coefficients contains “#VALUE!” only.

What in my input table can cause this output?

Thank you,

Silke

Silke,

If your dependent variable has only values of 0 and 1, then you should use binary logistic regression and no multinomial logistic regression. If you get #VALUE! cells, the likely cause is that the logistic regression model doesn’t fit the data (perhaps because your sample is too small).

Charles

Hi Charles,

How can I know the goodness of model with the value of Chi square and P value?

Anson

Anson,

See Goodness of Fit

Charles

Hi Charles,

How to set the column “#of interaction” in Multinomial Logistic Regression dialog box?

Thanks,

Anthony

Anthony,

There is no # of Interaction option. Are you trying to factor interactions into the model?

Charles

Hi Charles,

When I use =MLogitParam(F4:I53,1,1,TRUE,TRUE,0.05,20)

coeff,se,Wald will shown “#VALUE!”.

But when I change the formula to =MLogitParam(F4:I53,1,1,TRUE,TRUE,0.05,20), then everything is alright.So what is the difference of 20 and 17 in the above formula?

Thanks

Anthony,

You wrote 20 in both formulas. Which one gives the correct value: 20 or 17?

Charles

Charles,

Interation >17 will shown “#VALUE!”, otherwise <18 will shown value, but I don't know which value is correct. So how can I know which value I can use?

Thanks

Anthony,

If you send me an Excel file with your data and analysis, I will try to figure out why this strange situation is occurring. You can find my email address at Contact Us.

Charles

Charles,

Have sent the email to you

Thanks

Charles,

Have any news of my attachment?

Also my attachment columns A are alphanumeric data converted to numeric form.

So , where can I find Coeff of 4 and Coeff of 1

Thanks

hey, I am trying to run multinomial logistic regression but the MLogitParam function gives a value error along with other functions as well. Could you shed some light on the fix for it.

The usual reason is that either (1) there is some illegal data value or (2) logistic regression is not a good fit for the data.

If you send me an Excel file with your data and analysis I will try to figure out what is going on. You can find my email address at Contact Us.

Charles

Hi,

thank you very much for this extremely valuable resource.

i am trying to run a multinational logistic regression regression on some survey data but the MlogitParam gives me a value error.

could you help me with that.

Giovanni,

If you send me an Excel with your data and analysis I will try to figure out what is happening.

You can get my email address at Contact Us.

Charles

Hi Dr. Zaiontz

Thank you for your very useful website. I faced with a problem some days and I tried to fine a suitable answer but I couldn’t. please help me:

I Perform an Interview. I have a protocol to do that. It consist of 44 questions. I understand someone say them observed variables. For Analyse, latent variables are built from nominal and ordinal variables. I want to find correlation coefficient between latent variables but I don’t know how to define latent with nominal and ordinal variables? how can I mix/ combine/ merge nominal with ordinal to build a unique latent variable, after that I can test correlation between those latent( new) variables.

Thanks for your guidance.

Pooya Ghazinejad

Hello Pooya, but I don’t understand the situation that you are describing well enough to be able to give you a meaningful answer.

Charles

Yes Dr Zaiontz, You are right. so If you let me, I describe it more:

I Perform an Interview with 44 questions Protocol. The Structure of questions are based on 18 Variables. Major variables are coming from theory. Every major variable consist of 3,4 or more questions. In fact I want to compare every major variable with each other and find correlation coefficient. Now for analyze when I want to build major variable with available data, I have problem. for example:

major variable(1): organizational Justice consist of tow nominal data( tow questions of interview protocol with nominal categories) and one ordinal( one question of interview protocol with ordinal categories) .

major variable(2): organizational trust consist of three ordinal data and tow nominal.

Now my available data are the combination of ordinal and nominal. I don’t know how merge nominal with ordinal data to build a unique variable( The major variable base on theory, in above example V(1) and V(2)).

Thanks

Hello Pooya,

You say that you want to “compare every major variable with each other and find correlation coefficient”. If the data is ordinal this makes sense, but what is your goal if the data is categorical. E.g. if x = Income and y = party affiliation (Democrat, Republican, Other), then the correlation coefficient really doesn’t make much sense. In this case, it may be better to compare the values of x for each of the three party affiliations. This is essentially what Anova does.

If z = gender (Male, Female), the once again the correlation coefficient for y vs. z doesn’t (party vs. gender) really make much sense. In this case, it makes more sense to compare the number of people in the sample in each of the 2 x 3 combinations (MD, FD, MR, FR, MO, FO); this is essentially what chi-square test for independence does. You can achieve the same thing using the correlation coefficient on dummy variables as explained on the webpages

Relationship between Correlation and t test

Relationship between Correlation and Chi-square

Charles

i have sum question wat do Ln, Z, alpha n beta x1,x2 For ml logit formula?

Keisha,

I don’t see what you are referring to on the referenced webpage (Real Statistics Functions Multinomial Logistic Regression).

Charles