Suppose there are *r* + 1 possible outcomes for the dependent variable, 0, 1, …, *r*, with *r* > 1. Pick one of the outcomes as the reference outcome and conduct *r* pairwise logistic regressions between this outcome and each of the other outcomes. For our purposes we will assume that 0 is the reference outcome. The binary logistic regression model for the outcome *h*, with *h* ≠ 0, is defined by

Here *p _{ih}* is the probability that the

*i*th sample has outcome

*h*. Taking the exponential of both sides of the above equation yields the equivalent expression

where we define *x*_{i0} = 1 (in order to keep our notation simple). Now let

Whereas the model used in the binary case with only two outcomes is based on a binomial distribution, where there are more than two outcomes, the model we use is based on the multinomial distribution. Thus, the probability that the sample data occurs as it does is given by

where the y_{ih} are the observed values while the *p _{ih}* are the corresponding theoretical values.

Taking the natural log of both sides and simplifying we get the following definition.

**Definition 1**: The **log-likelihood** statistic for multinomial logistic regression is defined as follows:

**Observation**: The multinomial counterparts to Property 1 and 2 of Finding Logistic Regression Coefficients using Newton’s Method are as follows.

**Property 1**: For each *h* > 0, let *B _{h}* = [

*b*] be the (

_{hj}*k*+1) × 1 column vector of binary logistic regression coefficients of the outcome h compared to the reference outcome 0 and let

*B*be the

*r*(

*k*+1) × 1 column vector consisting of the elements in

*B*

_{1}, …,

*B*arranged in a column.

_{r}Also let *X* be the *n* × (*k*+1) design matrix (as described in Definition 3 of Least Squares for Multiple Regression). For outcomes *h* and *l* let *V _{hl}* be the

*n×*

*n*diagonal matrix whose main diagonal contains elements of form

and let *C _{hl}* =

*X*

^{T}

*V*. Now define the

_{hl}X*nr × nr*matrices

and *S = C*^{-1}. Then *S* is the covariance matrix for *B*.

**Property 2**: The maximum of the log-likelihood statistic occurs when for all *h* = 1, …, *r* and *j* = 1, …, *k* the following *r*(*k*+1) equations hold

**Observation**: Let *Y* = [y_{ih}] be the *n* × *r* matrix of observed outcomes of the dependent variable and let *P* = [*p*_{ih}] be the *n* × *r* matrix of the model’s predicted values for the outcomes (excluding the reference variable). Let *X* be the *n* × (*k*+1) design matrix. Then the matrix equation

where the right side of the equation is the (*k*+1) × *r* zero matrix, is equivalent to the equations in Property 2.

**Property 3**: Let *B, X, Y, P* and *S* be defined as in Property 1 and 2, and let *B*^{(0)} be a an initial guess of *B*, and for each *m* define the following iteration

then for sufficiently large *m*, *B*^{(m+1)} ≈ *B*^{(m)}, and so *B*^{(m) }is a good approximation of the coefficient vector *B*.

**Observation**: Here we can take as the initial guess for *B* the *r*(*k*+1) × 1 zero matrix.

**Observation**: If we group the data as we did in Example 1 of Basic Concepts of Logistic Regression (i.e. summary data), then Property 1 takes the form

where *n* = the number of groups (instead of the sample size) and for each *i* *n _{i}* = the number of observations in group

*i*.

Property 2 also holds where *Y* = [y_{ih}] is the *n* × *r* column vector of summarized observed outcomes of the dependent variable, *X* is the corresponding *n* × (*k*+1) design matrix, *P* =[*p _{ih}*] is the

*n*×

*r*column vector of predicted values and

*V*is the

_{hl}*n*×

*n*diagonal matrix whose main diagonal contains elements of form

Thus, the element in the *j*th row and *m*th column of *C _{hl}* is

In this case, the expressions for *L* and *LL* become

The values of *LL *and *R*^{2} as well as the chi-square test for significance are calculated exactly as for binary logistic regression (see Testing the Fit of the Logistic Regression Model).

As for *LL*, to the above formula we need to add the constant term

Note, however, that in calculating the different versions of *R*^{2}, the constant term is not included in *LL* and *LL*_{0}.

Dear Charles,

What should I do if the variance-covariance matrix is a singular matrix?

Are there any solution for this problem?

Dear Eki,

There are approaches in when the variance-covariance matrix is not invertible, but these go beyond the score of the website. You can find some of these by googling.

Charles

Dear Charles,

From the literature, what would you suggest as a rule to define the minimum sample size (1) for the binomial logistic regression, (2) for the multinomial logistic regression? E.g. a rule based on the number of independent variables, the observed proportions related to each possible outcome of the dependent variable. Should such a threshold be defined by considering the possible outcomes separately (e.g. the minimum observed proportion across the outcomes), or considering all rows (combinations of outcomes) of the summary table. Thanks.

Thomas,

The following webpage may be helpful to you

http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/power/llogN

Also G*Power provides a capability to calculate the sample size required for logistic regression.

Charles

Dear Charles,

Many thanks for this very useful material. I’d like to know if, even if probably similar to the binomial case, you could add a section on the comparison of regression models. In particular, I’d be also interested to know if LL0 is supposed to remain identical from one model to the other (I think it however depends on the way the summary table is designed, due to non linearity in the LL0 formula), and if the degrees of freedom can also be simply subtracted.

Many thanks in advance,

Kr,

Thomas

Thomas,

What sort of comparison are you looking for? When you use one model rather than another?

The LL0 values won’t be identical from model to model. Generally, they will be identical only when the summary data are identical.

Charles

Dear Charles,

Thanks for your prompt answer. I’m thinking of nested models, exactly as illustrated in the binomial case; with a chi-square test based on log likelihoods, and a substitution of LL0 by the LL1 related to the reference model. Is it valid for the multinomial case, provided we keep the summary table identical for all models? Once the final model selected, I’ll try to define a classification matrix based on RS capabilities.

Thomas

Sorry Thomas, but I don’t understand the approach that you are suggesting. I am not saying it is wrong; I just don’t understand it.

Charles

No problem. I’m thinking of the comparison of a base model and an extended model, as done in section http://www.real-statistics.com/logistic-regression/comparing-logistic-regression-models/

but for the multinomial case.

Thanks and cheers,

Thomas

Thomas,

I believe the same approach used for binary logistic regression will also work in the multinomial case.

Charles

All I want to figure out is how do get the population and sample for a multinomial logistic regress. I have four generational cohorts and five soft skill categories that I will be testing.

Jackie

Jackie,

Please explain what you mean by “how do [I] get the population and sample for a multinomial logistic regress”

Charles

Sir

When h j the element of v matrix is vii = (-1)*ni*Pih*Pil, but it seems in Excel Workbook you forget the term -1, why?

Colin

Sir

Sorry! My apology. You are right sir!

Colin