In problems where the possible outcomes are “Conservative, Labour or Liberal-Democrat” or “Red, Blue, Green” there is no apparent order to the possible outcomes. When the outcomes are “Small, Medium, Large” or “City, State, Country” or “Strongly Disagree, Disagree, Agree, Strongly Agree” there is an intrinsic order. We now address the case of multinomial logistic regression where the outcomes for the dependent variable can be ordered.

Suppose the possible outcomes for the dependent variable are 1, …, *r.* Let *p _{ih}* =

*P*(y

_{i}≤

*h*), i.e. the cumulative probabilities. Thus 0 =

*p*

_{i0}<

*p*

_{i}_{1}⋯ <

*p*= 1 (thereby capturing the order of the outcomes), where

_{ir}*p*

_{i0}= 0 for notational convenience. Then for

*h*= 1, …,

*r*

This model can be viewed as *r* binary models with events y ≤ *h* vs. *h* < y. The logit models for *h* = 1, …, *r–*1 are therefore

where for convenience we set *x*_{i0} = 1. Thus

The likelihood and log-likelihood statistics are as follows:

**Example 1**: A study was conducted based on a sample of 420 people to determine how satisfied people are with their mobile device based on a Likert scale (1 to 4 with 1 = not very satisfied and 4 = very satisfied). People in the sample were characterized by gender (female = 1 and male = 0) and age (0 = under 18, 1 = 18-24, 2 = 25-30, 3 = 31-40, 4 = over 40). Create an ordered logistic regression for this study based on the data in Figure 1.

**Figure 1 – Data for Example 1**

We now present different approaches for creating the ordinal logistic regression models, especially for finding the coefficients. We also compare the results obtained with those obtained using a multinomial logistics regression model.

**Using binary logistic regression models**

We begin by developing three cumulative binary regression models as shown in Figure 2.

**Figure 2 – Cumulative binary logistic regression models**** **

We now find the coefficients for each of these models using the Logistic Regression data analysis tool or the LogitCoeff function. E.g. the coefficients for the 1 vs. 2+3+4 model in range F16:F18 can be calculated by the array formula =LogitCoeff(A16:D23).

We now build the ordinal logistic regression model as shown in Figure 3 and 4

**Figure 3 – Ordinal logistic regression model (part 1)**

**Figure 4 – Ordinal logistic regression model (part 2)**

Representative formulas used in Figures 3 and 4 are shown in Figure 5.

**Figure 5 – Representative formulas from Figure 3 and 4**

**Note**: The formula for cell AL9 in Figure 5 should be =COUNT(AG6:AI7).

**Using Solver**

As we did for multinomial logistic regression models we can improve on the model we created above by using Solver. As before, our objective is to find the coefficients (i.e. range AG5:AI7 in Figure 4) that maximize *LL* (i.e. cell AD13 in Figure 3 or AL6 in Figure 4). The result is shown in Figure 6.

**Figure 6 – Revised ordinal logistic regression model**

We see that the new value of* LL* is -50.5323, a slight improvement over the previously calculated value of -51.0753.

**Observation**: We can’t initialize the coefficient values with zeros since this would result in taking the log of zero. We therefore choose to initialize the coefficients with the values from the three binary models.

**Real Statistics Function**: The Real Statistics Resource Pack provides the following supplemental array function

**OLogitPredC**(R0, R1) – outputs a 1 × *r* row vector which lists the probabilities of outcomes 1, …, *r* (in that order), where *r* = 1 + the number of columns in R1, for the values of the independent variables contained in the range R0 (in the form of either a row or column vector) based on the ordinal logistic regression coefficients contained in R1. Note that if R0 is a 1 × *k* row vector or *k* × 1 column vector, then R1 is a (*k*+1) × (*r* – 1) range.

Figure 7 shows the forecast for a female (gender = 1) 25-30 (age = 2).

**Figure 7 – Forecasting using the model**

Here the values in range C19:F19 are calculated using the formula =OLogitPredC(A19:B19, AG5:AI7) where the coefficients in range AG5:AI7 are shown in Figure 6.

The results show that the probability that a 25-30 year-old woman will be very unsatisfied is 15.1%, unsatisfied 25.7%, satisfied 31.1% and very satisfied 28.0%. These values agree with the data shown in range V11:Y11 of Figure 6.

We can also use the OLogitPredC function for forecasts corresponding to data not in our sample. E.g. for women in age group 2.5 (presumably halfway between 2 and 3, say aged approximately 21-28) we can use the formula =OLogitPredC(A20:B20, AG5:AI7) to obtain the results shown in range C20:F20 of Figure 7.

**Using proportional odds model**

A common approach used to create ordinal logistic regression models is to assume that the binary logistic regression models corresponding to the cumulative probabilities have the same slopes, i.e.* b*_{j1} = *b*_{j2} = ⋯ = *b*_{jr-1} for all* j* ≠ 0. This is the proportional odds assumption.

E.g. for Example 1 we can create a chart of the observed y values for each of the three binary logistic regression models (after sorting them) as shown in Figure 8.

**Figure 8 – Testing the proportional odds assumption**

As you can see these graphs are roughly parallel, indicating that the proportional odds assumption holds. While this assumption doesn’t always hold, this type of model is commonly used since it reduces the number of coefficients needed. In fact, defining *b*_{j }= *b*_{j1} = *b*_{j2} = ⋯ = *b*_{jr-1} for all *j* ≠ 0 and defining *a _{h}* =

*b*

_{0h}for all

*h*, we only require the full set of intercept coefficients but only one set of slope coefficients.

Figure 9 shows this model.

**Figure 9 – Proportional odds model**

Solver is used to maximize the value of *LL* (i.e. maximize cell AD13 while changing the values of AG9:AG13). We see that the value of *LL* is -50.9083, which is better than the value obtained from the first model (binary only) but not as good as the second model (binary + Solver). In any case, the result obtained from all three models are similar.

**Observation**: As in the previous model we can’t initialize the coefficient values with zeros since this would result in taking the log of zero. We therefore choose to initialize the coefficients with the intercepts from the three binary models and the slope coefficients from the first binary model.

**Using multinomial logistic regression**

We could of course ignore the order in Example 1 and simply use a multinomial logistic regression model. The results are shown in Figure 10.

**Figure 10 – Multinomial logistic regression model**

Here we are using the following functions

=MLogitCoeff(A25:F33,3,TRUE,TRUE)

=MLogitTest(A26:F33,3,TRUE)

to calculate the coefficients, *LL0, LL* and other values. We also use

=MLogitPredC(I33:J33,$J$26:$L$28)

to obtain the forecasted values. The values are similar to those we have seen earlier, but in general when the independent variables are ordered it is best to use an ordinal logistic regression model.

Dear Charles,

Can you suggest a book which has all these regression chapters?

Thanks !!!

Rajeeb Shakya

Rajeeb,

I plan to publish a book later this year with this sort of information.

Charles

Dear Charles,

Can you suggest me any other book for the time being? I am really in need of such a book.

Thank you !!

Rajeeb,

There are lots of books. Some looking at particular analyses (regression, anova, manova, etc.), some focused on particular fields of study (biology, psychology, etc.), etc., etc. Which topics are you most interested in?

Charles

Dear Charles,

thank you for all your great work and your generosity for sharing it with the greater public.

One question. Can my independent variables be more than two, and can they be ordinal too? More precisely, I want to see if 5 or 6 independent variables that I found to be statistically significant in the first part of my analysis (associations with chi-squares and t-test for the means) can be still significant when all put together, and, if they are, what is their impact on the probability of the outcome variable (which is ordinal). Now, my question, hence, is if it’s possible to include in this ordinal model those 5-6 variables that are ordinal too (“strongly agree”, “slightly agree”…). I am only asking so because I have only seen binary ordinal regression here.

Thanks dear Charles,

all the best

Michelle

Michelle,

You can use ordinal data for the independent variables provided you can assume that the intervals between the Likert values are equal. E.g. for Likert values 1, 2, 3, 4, 5, this assumption means that the “2”-“1” = “3”-“2” = “4”-“3” = “5”-“4”.

Charles

Hi Charles,

About the degree of freedom in figure 5, you enter here = COUNT(M5:O5).

But I do not understand why this is supposed to be the degree of freedom.

For example, on the page

Finding Multinomial Logistic Regression Coefficients,

df is calculated by =COUNT(I5:J5).

Are they both right?

If so, I don’t understand what’s going on.

MM,

There clearly is an error here. I need to research what the value should be.

Charles

The formula is not correct. I have now corrected the formula on the website.

Thanks for catching this error. I appreciate your help in making the website better.

Charles

Dear Dr. Charles,

Thank you for your kind efforts and sharing your knowledge with the greater public.

I found in the table showing the the formulas for the example in figure 3 & 4 the cell for the second formula from the top should be T5 = Q5/(1+Q5 ) to get the cum. prob of the first observation of the second dep. variable.

Thanks Chris for finding this error. I have now corrected this on the referenced webpage-

I appreciate your help in making the website better and easier to understand.

Charles

Hi Charles,

I stumbled across your blog and it has been a great deal of help! I didn’t think ordinal logistic regression was possible in Excel before discovering your site.

One question, though. In the binary logistic regression part of your blog, you say that the researcher should use the outcome with the larger sample size as the reference variable. I’m assuming that’s the reason you don’t calculate the Odds for outcome ‘4’.

My question is in your sample Excel sheet, the Predicted Probabilities for outcome ‘4’ is calculated as (1-Pred Cum Prob of outcome ‘3’). I’ve had to do some manipulation to your Excel worksheet labeled OLogit1 as I have 5 outcomes instead of 4, and the third outcome has the largest sample size. How would you go about readjusting the predicted probabilities for such an issue? I’m trying to figure out how to calculate the predicted probabilities for my outcome ‘3’.

Thanks!

M

Maurice,

There is nothing special about the number of outcomes. The approach for 5 outcomes is pretty much the same as for 4 outcomes.

Charles

I see. Thanks for the response!

I suppose my question is why is the formula for the predicted probabilities for outcome 4 (in cell Y5) “1-outcome 3”? Cells V5:X5 have different formulas, so that’s where I’m getting a bit hung up.

Thanks for sharing such great content by the way!

Maurice,

The value in cell U5 is the cumulative probability for categories 1, 2 and 3. Since there are only 4 categories, the cumulative probability for categories 1, 2, 3 and 4 is 1. This means that the probability for category 4 is 1 – U5.

Charles

Dr good evening, excuse me I don’t understand why when I am Using binary logistic regression models, I have results as

p-Pred Suc-Pred Fail-Pred LL % Correct HL Stat Coeff

All with #!valor¡

Thank you

Gerardo,

It is likely that the binary logistic regression model is not a fit for your data. If you send me an Excel file with your data, I will try to figure out what is happening.

Charles

Hi again Charles,

I had another questions about the significance of the independent variables from the results. I noticed there are no p-values for each independent variables. There is only one p-value. Does this value show the significance of all the variables? Is there a way I can see from the results the significance of each independent variable? i.e, a p-value for each independent variables?

Thank you

Matt

Matt,

The one p-value measures the difference between the given ordinal logistic regression model and the null model, which is not a particularly illuminating statistic.

I have not yet implemented the calculation of standard errors for each model coefficient. From that a p-value per coefficient can be calculated.

Charles

Thank you for your help Charles.

You might have already answered a similar question but I’m not familiar with logistic regression so I was hoping for your advice. I am looking to predict a clients loan repayment performance based on his history and characteristics. Considering I have one dependent variable which is the loan repayment performance of a client and around 20 independent variables including clients characteristics that are both categorical, binary and continuous. Since I would want to predict several outcomes of a clients such as 1=No problem, 2=fairly late repayment, 3=very late repayment, 4=default. Would this work? Do you think that the ordinal regression model would be suitable in this case?

Thank you,

Matt

Matt,

I assume that you view 1, 2, 3, 4 as ordered choices. In this case ordinal regression would be a reasonable choice.

Charles

Thank you Charles! I had another question. I was trying to follow your method above but I could not understand how you got to the ordinal logistic regression model in figure 3. Did you use the real statistics tool to get it? Or did you do it manually? I got confused when trying to understand the formulas in figure 5.

Never mind, I found your excel spreadsheet.

Thanks

Thanks Charles,

I was doing the ordered logistic regression model using the excel spreadsheet on your website and was following all of the formulas using exactly the same data. The thing is for the improved Ordered logistic model using solver I was not getting the same results for LL and the coefficients you got after using solver. Maybe I’m inputting the data wrongly on solver? I’m not sure what the problem is.

Please help

Matt,

It took me quite a long time to find the problem, but I found that there is a mistake in cell L12 of the example. For some unknown reason the contents of the cell changed from =D12 to 11. This made all the calculation wrong. Shortly I will correct this on the website and in examples workbook. There is no problem in the Real Statistics function itself.

Thanks very much for catching this error.

Charles

hi, charles

your resource is a great help

(they say that when you are truly knowledgeable you can put them in simple terms)

however i am curious as to how to find confidence intervals for each ordinal value (as in elections)

…also a question i would think will help all those reading the site and want a complete answer .What does it change when you have categories that do not signify proximity among values in either the dependent or the independent variables.

I am not sure I understand your first question. Are you looking for a confidence interval for logistic regression similar to that used for linear regression? I don’t understand your second question.

Charles

Thank you! That is very helpful.Using the =LogitCoeff(A16:D23), I obtained -2.48986 but for F 17 & 18 it is quite different. How could I do that?

Yohannes,

If you send me an Excel file with your calculations, I will try to figure out what is happening.

Charles

Thank you again, Charles. I am sorry to bother you personally. The data are those given in Example 1 in this page, your data. I tried to calculate them in Excel (as presented in Fiure 2 above) and I obtained -2.48986 for cell F16, but for cells F17 & 18 it is #NAME?. How can I solve it?

You are amazing!!!

Everything I have learned about logistic regression, I have you to thank. Throughout my dissertation I have used you website as a help guide and now working in an environment where this is regression is predominantly used I always come back here for reference and help. Thank you Charles Zaiontz for creating and mastering this website.

Thank you very much. I am glad that I was able to help you.

Charles

I do have one questions;.

Do you have an example of ordinal logistic regression for raw data as opposed to summarised? I have very large data that has 17 dependent variables and 2 independent variables of which one is categorical and the other is continuous. My aim is to avoid summarising this, as this may affect the results of the continuous variable.

Many thanks,

J.

If the values for the continuous variable are all different, then the summarized data will be exactly the same as the raw data except formatted slightly differently. The results won’t be affected since the summarizing that is done doesn’t combine any data except when all the values for the independent variables are identical. For the current release I have added the Multinomial Logistic Regression data analysis tool. I hope to add the Ordinal Logistic Regression tool some time soon.

Charles

Hi Charles,

This is an example of my data set in summary:

Class Volume 1 2 3 4 Total

0 109 1 1

0 131.9 1 1

0 12 1 1

0 126.9 1 1

1 124.2 1 1

1 125.36 1 1

1 134.2 1 1

1 133.6 1 1

Total 2 2 2 2 8

For some reason MLogitCoeff does not work for this example. I am guessing this is because the row totals all add up to 1 for each individual continuous variable in ‘volume’. Here 1-4 is the code for the dependent variable. Do you have any advise on this?

Sorry, but I don’t understand your data. Which are the independent variables and which is the dependent variables? Most importantly what are the range of values for the dependent variable?

Charles

No problem. Thanks Charles.

Hi Charles,

I am using SPSS to conduct a OLR. I have 1 categorical variable (with 4 items) as independent variable and 1 Likert scale (5 levels) ordinal variable as the dependent variable. My questions are:

1. What are the tests that I need to conduct to satisfy the OLR assumptions?

2. What is the format to write the null hypothesis related to OLR?

3. Is there any standard steps (or template) for summarizing the results of OLR?

Thank you.

Sree

Sorry Sree, but I don’t use SPSS.

Charles

Dear Sir

is it possible that I can take continuous values for independent variables (Xi) and discrete values for dependent variables (Y)? If yes, then how? Moreover in case of yes can we say it “regression” or is it “classification”?

If your independent variables are continuous and the dependent variable is dichotomous then you would typically think of using ordinary logistic regression. If the dependent variable takes a small number of discrete, but unordered values then you would explore using multinomial logistic regression. If the dependent variable takes a small number of discrete ordered values (and you care about the order) then you might use ordinal logistic regression. If the dependent variable takes a large number of ordered discrete values then you might even try using multiple linear regression (pretending that the discrete values are continuous) if the model provides a good fit. Typically these are called “regression”, but they are being used for “classification purposes”.

Charles

Good day Mr. Charles. I am in dire need of your help. I find your article useful, as I want to perform the ordinal logistic regression analysis on some data that I have. However, you are article does not go into full detail about how to get to the results you have displayed. I really need your assistance sir. Perhaps I’m not too technically apt to complete the process but I am writing a paper and I really need your assistance with this. Can you email me at the address I posted to reply to article so that we can dialogue further? Thank you

Kurt,

You can get the details for all the ordinal logistic examples on the referenced webpage by downloading the Excel spreadsheets that I created for these examples. Just go to the webpage .

Charles

OK, thank you sir. I will check it out

The LogitCoeff function doesn’t work for me sir

Thanks a lot Charles

Hi Charles,

It was nice reading your blog, can you explain bit more about how to get Coefficients, since i am not finding that LogitCoeff function in excel. Please help me in calculating this.

Thanks in advance.

The LogitCoeff function is provided by the Real Statistics Resource Pack, software which adds additional functions and data analysis tools to Excel. You can download this software for free at http://www.real-statistics.com/free-download/real-statistics-resource-pack/.

You can find a description of the LogitCoeff function at the webpage http://www.real-statistics.com/logistic-regression/real-statistics-functions-logistic-regression/.

Charles

This is really a great help to a project I’ve been working on. Do you have an example where you have two or more ordinal variables within the data set?

Erik,

Do you mean two or more ordinal

dependentvariables?Charles

Thank you Charles!

Thank you for sharing your knowledge and for your kindness. You are the first person that has shown that ordinal logistic regression can be done in excel and I am very excited about this!

Kind Regards

Declan