Multinomial Logistic Regression Functions
Real Statistics Functions: The following are array functions where R1 is a range which contains data in either raw or summary form (without headings).
MLogitCoeff(R1, r, lab, head, iter) – calculates the multinomial logistic regression coefficients for data in range R1. If head = TRUE then R1 contains column headings.
MLogitParam(R1, r, h, lab, head, alpha, iter) – calculates the multinomial logistic regression coefficients based on the data in R1 for one value h of the dependent variable (default: h = 1). If head = TRUE then R1 contains column headings. Includes the standard errors, Wald statistic, p-value and 1 – α confidence interval.
MLogitTest(R1, r, lab, iter) – calculates LL of the full and reduced models, the chi-square statistic and the p-value for the data in range R1 (without headings)
MLogitRSquare(R1, r, lab, iter) – calculates LL of the full and reduced models for the data in range R1 (without headings) and three versions of R2 (McFadden, Cox and Snell, Nagelkerke) as well as AIC and BIC
MLogit_Accuracy(R1, r, lab, head, iter): returns a column array with the accuracy of the multinomial logistic regression model defined from the data in R1 for each independent variable and the total accuracy of the model. Thus, if R1 contains k independent variables, then the output is a k+1 × 1 column array (or a k+1 × 2 array if lab = TRUE).
Here the parameters lab, head, r, alpha and iter are optional.
When r = 0 (default) then the data is in raw form, whereas if r ≠ 0 the data is in summary form where the dependent variable takes values 0, 1, …, r.
When lab = TRUE then the output includes row and/or column headings and when lab = FALSE (the default) only the data is outputted.
The parameter alpha is used to calculate a confidence interval and takes a value between 0 and 1 with a default value of .05. The parameter iter determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20. The default value of head is FALSE.
The Real Statistics Resource Pack also provides the following array functions:
MLogitPred(R0, R1, r, iter) – outputs a 1 × rr row vector which lists the probabilities of outcomes 0, 1, …, rr (in that order) for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression model calculated from the data in R1 (without headings). If r = 0 (raw data) then rr = the maximum value in the last column of R1. If r ≠ 0 then rr = r.
MLogitPredC(R0, R2) – outputs a 1 × r row vector which lists the probabilities of outcomes 0, 1, …, r (in that order), where r = 1 + the number of columns in R2, for the values of the dependent variables contained in the range R0 (in the form of either a row or column vector) based on the logistic regression coefficients contained in R2. Note that if R0 is a 1 × k row vector or k × 1 column vector, then R2 is a (k +1) × (r – 1) range.
MLogitSummary(R1, head) – takes the raw data in range R1 and outputs an equivalent array in summary form. If head = TRUE then R1 contains column headings as well as the output.
MLogitSelect(R1, s, head) – array function which takes the summary data in range R1 and outputs an array in summary form based on s. If head = TRUE then R1 includes column headings as well as the output. The string s is a comma delimited list of independent variables in R1 and/or interactions between such variables. E.g. if s = “2,3,2*3” then the data for the independent variables in columns 2 and 3 of R1 plus the interaction between these variables are output.
In addition, there is the MLogitExtract function which is described in Finding Multinomial Logistic Regression Coefficients.
Observation: Figure 1 shows the use of some of the supplemental functions described above for Example 1 of Finding Multinomial Logistic Regression Coefficients (where the model data is in summary form). The output should agree with the output obtained from the Newton’s Method model shown in Figure 3, 4 and 5 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method.
Figure 1 – Multinomial Logistic Regression functions
Some key formulas in Figure 1 are shown in Figure 2.
Figure 2 – Key formulas from Figure 1
Observation The AIC (Akaike’s Information Criterion) and BIC (Baysian Information Criterion) statistics which are displayed as part of the MLogitRSquare function are calculated by the following formulas.
AIC = -2LL + 2(k+1)r BIC = -2LL + (k+1)r ln(N)
where N = the total number of observations. The use of these statistics is as described for binary logistic regression models in Real Statistics Functions for Logistic Regression.
Observation: Figure 3 shows the use of some of the supplemental functions described above for a multinomial extension to Example 2 of Finding Logistic Regression Coefficients using Newton’s Method (where the model data is in raw form). Here, the outcome 0 = female, 1 = male and 2 = hermaphrodite.
Figure 3 – Multinomial Logistic Regression functions with raw input data
Here range E5:I10 is calculated by =MLogitSummary(A5:C53), the range E14:E18 is calculated by =MLogitTest(A5:C53,0,TRUE) and the range H14:I18 is calculated by =MLogitTest(E5:I10,2,TRUE).
Example 2: Calculate the accuracy of the multinomial logistic regression model for Example 1 of Finding Multinomial Logistic Regression Coefficients (the data is duplicated in range A5:E17 of Figure 4).
We first show how to do the calculations manually in Figure 4.
Figure 4 – Multinomial regression model accuracy
Range F6:H17 shows the probabilities predicted by the model for each data outcome. This is the output from the array formula =MLogitPred(A6:B17,$A$6:$E$17,2). We see, for example, that the highest probability for Dosage 20 and Gender 0 is Dead (.739403 in cell F6) and so 13 of the samples are predicted correctly and the other 0+8 = 8 are predicted incorrectly. The number of samples predicted correctly when the model predicts Dead is shown in column I, with columns J and K showing the number of samples predicted correctly when the model predicts Cured or Sick, respectively.
For example, cell I6 (for Dead) contains the formula =IF(F6>=MAX($F6:$H6),C6,””). Similarly, cell J6 (for Cured) contains the formula =IF(G6>=MAX($F6:$H6),D6,””) and cell K6 (Sick) contains the formula =IF(G6>=MAX($F6:$H6),D6,””). Cell L6 contains the total samples for row 6 predicted correctly by the model, namely 13, using the formula =SUM(I6:K6).
If we highlight the range I6:L17 and press Ctrl-D, we get all the correctly predicted sample values. Summing up each column, we get the values in I18:L18. Dividing these values by the values in range C18:E18, we get the percentage correct shown in range I19;L19.
In particular, we see that the model only predicts 55% of sample elements correctly.
We can obtain the same result using the array formula
=MLogit_Accuracy(A5:E17,2,TRUE,TRUE), as shown in Figure 5.
Figure 5 – Model accuracy
Data Analysis Tool
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack supplies a Multinomial Logistic Regression data analysis tool that automates many of the capabilities described above.
For example, to perform the analysis for Example 1 of Finding Multinomial Logistic Regression Coefficients using Newton’s Method, press Ctrl-m and double click on the Regression option in the dialog box that appears. Next click on the Multinomial Logistic Regression option in the dialog box that appears and click on the OK button. This will bring up the dialog box shown in Figure 6.
Figure 6 – Multinomial Logistic Regression dialog box
Fill in the fields as shown in Figure 6. Note that columns A and B contain the data for the independent variables, and so you enter the number 2 in the # of Independent Variables field. When you press the OK button, the output displayed in Figure 7 will appear.
Figure 7 – Multinomial Logistic Regression output for summary input data
To perform the analysis for Example 1, follow the steps described above. When the dialog box shown in Figure 6 appears, insert the range A4:C53 (from Figure 5) in the Input Range field.
Since the input range has 3 columns and the # of Independent Variables is 2, this leaves only one column for the dependent variables. The software knows that this means that the input data was formatted in raw data format.
The output will appear as shown in Figure 8.
Figure 8 – Multinomial Logistic Regression output for raw input data
Note that the output contains the summary data shown in range E6:I4, as well as output based on this summary data that is formatted as in Figure 7.