# Real Statistics Multinomial Logistic Regression Capabilities

Multinomial Logistic Regression Functions

Real Statistics Functions:  The following are array functions where R1 is a range which contains data in either raw or summary form (without headings).

MLogitCoeff(R1, r, lab, head, iter) – calculates the multinomial logistic regression coefficients for data in range R1. If head = TRUE then R1 contains column headings.

MLogitParam(R1, r, h, lab, head, alpha, iter) – calculates the multinomial logistic regression coefficients based on the data in R1 for one value h of the dependent variable (default: h = 1). If head = TRUE then R1 contains column headings. Includes the standard errors, Wald statistic, p-value and 1 – α confidence interval.

MLogitTest(R1, r, lab, iter) – calculates LL of the full and reduced models, the chi-square statistic and the p-value for the data in range R1 (without headings)

MLogitRSquare(R1, r, lab, iter) – calculates LL of the full and reduced models for the data in range R1 (without headings) and three versions of R2 (McFadden, Cox and Snell, Nagelkerke) as well as AIC and BIC

MLogit_Accuracy(R1, r, lab, head, iter): returns a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. Thus, if R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE).

Here the parameters lab, headr, alpha and iter are optional.

When r = 0 (default) then the data is in raw form, whereas if r ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, r.

When lab = TRUE then the output includes row and/or column headings and when lab = FALSE (the default) only the data is outputted.

The parameter alpha is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter iter determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of head is FALSE.

The Real Statistics Resource Pack also provides the following array functions:

MLogitPred(R0, R1, riter) – outputs a 1 × rr row vector which lists the probabilities of outcomes 0, 1, …, rr (in that order) for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression model calculated from the data in R1 (without headings). If r = 0 (raw data) then rr = the maximum value in the last column of R1. If r ≠ 0 then rr = r.

MLogitPredC(R0, R2) – outputs a 1 × r row vector which lists the probabilities of outcomes 0, 1, …, r (in that order), where r = 1 + the number of columns in R2, for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression coefficients contained in R2. Note that if R0 is a 1 × k row vector or k × 1 column vector, then R2 is a (k +1) × (r – 1) range.

MLogitSummary(R1, head) – takes the raw data in range R1 and outputs an equivalent array in summary form. If head = TRUE then R1 contains column headings as well as the output.

MLogitSelect(R1, s, head) – array function which takes the summary data in range R1 and outputs an array in summary form based on s. If head = TRUE then R1 includes column headings as well as the output. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output.

In addition, there is the MLogitExtract function which is described in Finding Multinomial Logistic Regression Coefficients.

Observation:  Figure 1 shows the use of some of the supplemental functions described above for Example 1 of  Finding Multinomial Logistic Regression Coefficients (where the model data is in summary form). The output should agree with the output obtained from the Newton’s Method model shown in Figure 3, 4 and 5 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method.

Figure 1 – Multinomial Logistic Regression functions

Some key formulas in Figure 1 are shown in Figure 2.

Figure 2 – Key formulas from Figure 1

Observation The AIC (Akaike’s Information Criterion) and BIC (Baysian Information Criterion) statistics which are displayed as part of the MLogitRSquare function are calculated by the following formulas.

AIC = -2LL + 2(k+1)r                  BIC = -2LL + (k+1)r ln(N)

where N = the total number of observations. The use of these statistics is as described for binary logistic regression models in Real Statistics Functions for Logistic Regression.

Observation:  Figure 3 shows the use of some of the supplemental functions described above for a multinomial extension to Example 2 of Finding Logistic Regression Coefficients using Newton’s Method (where the model data is in raw form). Here, the outcome 0 = female, 1 = male and 2 = hermaphrodite.

Figure 3 – Multinomial Logistic Regression functions with raw input data

Here range E5:I10 is calculated by =MLogitSummary(A5:C53), the range E14:E18 is calculated by =MLogitTest(A5:C53,0,TRUE) and the range H14:I18 is calculated by =MLogitTest(E5:I10,2,TRUE).

Model Accuracy

Example 2:  Calculate the accuracy of the multinomial logistic regression model for Example 1 of  Finding Multinomial Logistic Regression Coefficients (the data is duplicated in range A5:E17 of Figure 4).

We first show how to do the calculations manually in Figure 4.

Figure 4 – Multinomial regression model accuracy

Range F6:H17 shows the probabilities predicted by the model for each data outcome. This is the output from the array formula =MLogitPred(A6:B17,\$A\$6:\$E\$17,2). We see, for example, that the highest probability for Dosage 20 and Gender 0 is Dead (.739403 in cell F6) and so 13 of the samples are predicted correctly and the other 0+8 = 8 are predicted incorrectly. The number of samples predicted correctly when the model predicts Dead is shown in column I, with columns J and K showing the number of samples predicted correctly when the model predicts Cured or Sick, respectively.

For example, cell I6 (for Dead) contains the formula =IF(F6>=MAX(\$F6:\$H6),C6,””). Similarly, cell J6 (for Cured) contains the formula =IF(G6>=MAX(\$F6:\$H6),D6,””) and cell K6 (Sick) contains the formula =IF(G6>=MAX(\$F6:\$H6),D6,””). Cell L6 contains the total samples for row 6 predicted correctly by the model, namely 13, using the formula =SUM(I6:K6).

If we highlight the range I6:L17 and press Ctrl-D, we get all the correctly predicted sample values. Summing up each column, we get the values in I18:L18. Dividing these values by the values in range C18:E18, we get the percentage correct shown in range I19;L19.

In particular, we see that the model only predicts 55% of sample elements correctly.

We can obtain the same result using the array formula

=MLogit_Accuracy(A5:E17,2,TRUE,TRUE), as shown in Figure 5.

Figure 5 – Model accuracy

Data Analysis Tool

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack supplies a Multinomial Logistic Regression data analysis tool that automates many of the capabilities described above.

For example, to perform the analysis for Example 1 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method, press Ctrl-m and double click on the Regression option in the dialog box that appears. Next click on the Multinomial Logistic Regression option in the dialog box that appears and click on the OK button. This will bring up the dialog box shown in Figure 6.

Figure 6 – Multinomial Logistic Regression dialog box

Fill in the fields as shown in Figure 6. Note that columns A and B contain the data for the independent variables, and so you enter the number 2 in the # of Independent Variables field. When you press the OK button, the output displayed in Figure 7 will appear.

Figure 7 – Multinomial Logistic Regression output for summary input data

To perform the analysis for Example 1, follow the steps described above. When the dialog  box shown in Figure 6 appears, insert the range A4:C53 (from Figure 5) in the Input Range field.

Since the input range has 3 columns and the # of Independent Variables is 2, this leaves only one column for the dependent variables. The software knows that this means that the input data was formatted in raw data format.

The output will appear as shown in Figure 8.

Figure 8 – Multinomial Logistic Regression output for raw input data

Note that the output contains the summary data shown in range E6:I4, as well as output based on this summary data that is formatted as in Figure 7.

### 28 Responses to Real Statistics Multinomial Logistic Regression Capabilities

1. keisha says:

i have sum question wat do Ln, Z, alpha n beta x1,x2 For ml logit formula?

• Charles says:

Keisha,
I don’t see what you are referring to on the referenced webpage (Real Statistics Functions Multinomial Logistic Regression).
Charles

Hi Dr. Zaiontz
Thank you for your very useful website. I faced with a problem some days and I tried to fine a suitable answer but I couldn’t. please help me:
I Perform an Interview. I have a protocol to do that. It consist of 44 questions. I understand someone say them observed variables. For Analyse, latent variables are built from nominal and ordinal variables. I want to find correlation coefficient between latent variables but I don’t know how to define latent with nominal and ordinal variables? how can I mix/ combine/ merge nominal with ordinal to build a unique latent variable, after that I can test correlation between those latent( new) variables.

• Charles says:

Hello Pooya, but I don’t understand the situation that you are describing well enough to be able to give you a meaningful answer.
Charles

Yes Dr Zaiontz, You are right. so If you let me, I describe it more:
I Perform an Interview with 44 questions Protocol. The Structure of questions are based on 18 Variables. Major variables are coming from theory. Every major variable consist of 3,4 or more questions. In fact I want to compare every major variable with each other and find correlation coefficient. Now for analyze when I want to build major variable with available data, I have problem. for example:
major variable(1): organizational Justice consist of tow nominal data( tow questions of interview protocol with nominal categories) and one ordinal( one question of interview protocol with ordinal categories) .
major variable(2): organizational trust consist of three ordinal data and tow nominal.
Now my available data are the combination of ordinal and nominal. I don’t know how merge nominal with ordinal data to build a unique variable( The major variable base on theory, in above example V(1) and V(2)).
Thanks

• Charles says:

Hello Pooya,

You say that you want to “compare every major variable with each other and find correlation coefficient”. If the data is ordinal this makes sense, but what is your goal if the data is categorical. E.g. if x = Income and y = party affiliation (Democrat, Republican, Other), then the correlation coefficient really doesn’t make much sense. In this case, it may be better to compare the values of x for each of the three party affiliations. This is essentially what Anova does.

If z = gender (Male, Female), the once again the correlation coefficient for y vs. z doesn’t (party vs. gender) really make much sense. In this case, it makes more sense to compare the number of people in the sample in each of the 2 x 3 combinations (MD, FD, MR, FR, MO, FO); this is essentially what chi-square test for independence does. You can achieve the same thing using the correlation coefficient on dummy variables as explained on the webpages

Charles

3. Giovanni Esposito says:

Hi,

thank you very much for this extremely valuable resource.

i am trying to run a multinational logistic regression regression on some survey data but the MlogitParam gives me a value error.
could you help me with that.

• Charles says:

Giovanni,
If you send me an Excel with your data and analysis I will try to figure out what is happening.
Charles

hey, I am trying to run multinomial logistic regression but the MLogitParam function gives a value error along with other functions as well. Could you shed some light on the fix for it.

• Charles says:

The usual reason is that either (1) there is some illegal data value or (2) logistic regression is not a good fit for the data.
If you send me an Excel file with your data and analysis I will try to figure out what is going on. You can find my email address at Contact Us.
Charles

5. Anthony says:

Hi Charles,

How to set the column “#of interaction” in Multinomial Logistic Regression dialog box?

Thanks,
Anthony

• Charles says:

Anthony,
There is no # of Interaction option. Are you trying to factor interactions into the model?
Charles

• Anthony says:

Hi Charles,

When I use =MLogitParam(F4:I53,1,1,TRUE,TRUE,0.05,20)
coeff,se,Wald will shown “#VALUE!”.
But when I change the formula to =MLogitParam(F4:I53,1,1,TRUE,TRUE,0.05,20), then everything is alright.So what is the difference of 20 and 17 in the above formula?

Thanks

• Charles says:

Anthony,
You wrote 20 in both formulas. Which one gives the correct value: 20 or 17?
Charles

• Anthony says:

Charles,

Interation >17 will shown “#VALUE!”, otherwise <18 will shown value, but I don't know which value is correct. So how can I know which value I can use?

Thanks

• Charles says:

Anthony,
If you send me an Excel file with your data and analysis, I will try to figure out why this strange situation is occurring. You can find my email address at Contact Us.
Charles

• Anthony says:

Charles,

Have sent the email to you

Thanks

• Anthony says:

Charles,

Have any news of my attachment?
Also my attachment columns A are alphanumeric data converted to numeric form.
So , where can I find Coeff of 4 and Coeff of 1

Thanks

6. Anson says:

Hi Charles,

How can I know the goodness of model with the value of Chi square and P value?

Anson

7. Silke Roedder says:

Dear Charles,
In the output for a multinomial regression analyses using 10 independent (categorical, and numberical) and 1 dependent variable (categorical, 0,1) the table with the coefficients contains “#VALUE!” only.
What in my input table can cause this output?

Thank you,

Silke

• Charles says:

Silke,
If your dependent variable has only values of 0 and 1, then you should use binary logistic regression and no multinomial logistic regression. If you get #VALUE! cells, the likely cause is that the logistic regression model doesn’t fit the data (perhaps because your sample is too small).
Charles

8. astrid says:

Hi Dr. Zaiontz

Is there a row/column limit for the multinomial logistic regression function? I can’t seem to make it work with my 370 x 7 dataset.=(

Regards,
Astrid

• Charles says:

Astrid,
The limit is much bigger than 370 rows and 7 columns. The limit is a little more than 65,000 cells, and even then I show ways of exceeding this limit.
When you say that it doesn’t work for a 370 x 7 data set, do you mean that you get error cells. The likely reason for this is that the logistic regression model doesn’t converge to a solution, which is an indication that the model is not a good fit for the data.
Charles

• astrid says:

Hello Dr. Zaiontz,

Yes I get #VALUE! error. I was trying to create a classification model using multinomial logistic regression, and since I am not able to make it work, I ended up running multiple logistic regressions per class. It worked and it gave me an ok score at kaggle. I’m new to this predictive modeling thingy and may I ask if a multinomial logistic regression would yield different results as doing multiple logistic regressions per class?

regards,
astrid

• Charles says:

Astrid,
The results may be different, although maybe only marginally so. This is shown on the website, when I show how to use multiple binary logistic regressions to generate a multinomial logistic regression model.
Charles

9. Martin says:

Hi Dr Zaiontz,

I have a problem with #VALUE! thing. At first i had thought that it could be caused by unproper data but i tried to make it with data from Example 1 and it didn’t work. Could you tell me what are the possible reasons of my problem?

• Charles says:

Martin,
I have tested Example 1 and it should work.
If you send me an Excel file with your data and the analysis you performed for Example, I will try to figure out what went wrong.