The **Hosmer-Lemeshow test** is used to determine the goodness of fit of the logistic regression model. Essentially it is a chi-square goodness of fit test (as described in Goodness of Fit) for grouped data, usually where the data is divided into 10 equal subgroups. The initial version of the test we present here uses the groupings that we have used elsewhere and not subgroups of size ten.

Since this is a chi-square goodness of fit test, we need to calculate the HL statistic

where *g* = the number of groups. The test used is chi-square with *g* – 2 degrees of freedom. A significant test indicates that the model is not a good fit and a non-significant test indicates a good fit.

**Example 1**: Use the Hosmer-Lemeshow test to determine whether the logistic regression model is a good fit for the data in Example 1 in Comparing Logistic Regression Models.

**Figure 1 – Hosmer-Lemeshow Test**

In our example the sum is taken over the 12 Male groups and the 12 Female groups. The observed values are given in columns H and I (duplicates of the input data columns C and D), while the expected values are given in columns L and M. E.g. cell L4 contains the formula =K4*J4 and cell M4 contains the formula =J4-L4 or equivalently =(1-K4)*J4.

The HL statistic is calculated in cell N16 via the formula =SUM(N4:N15). E.g. cell N4 contains the formula =(H4-L4)^2/L4+(I4-M4)^2/M4.

The Hosmer-Lemeshow test results are shown in range Q12:Q16. The HL stat is 24.40567 (as calculated in cell N16), *df* = *g* – 2 = 12 – 2 = 10 and p-value = CHIDIST(24.40567, 10) = .006593 < .05 = α, and so the test is significant, which indicates that the model is not a good fit.

**Observation**: The Hosmer-Lemeshow test needs to be used with caution. It tends to be highly dependent on the groupings chosen, i.e. one selection of groups can give a negative result while another will give a positive result. Also when there are too few groups (5 or less) then usually the test will show a model fit.

As a chi-square goodness of fit test, the expected values used should generally be at least 5. In Example 1 the cells L9, L15, M4 and M10 all have values less than 5, with cells M4 and M10 especially troubling with values less than 1. We now address the problems of cells M4 and M10.

We can eliminate the first of these by combining the first two rows, as shown in Figure 2. Here p-Pred for the first row (cell K23) is calculated as a weighted average of the first two values from Figure 1 using the formula =(J4*K4+J5*K5)/(J4+J5). In a similar manner we combine the 7^{th} and 8^{th} rows from Figure 20.23.

**Figure 2 – Revised Hosmer-Lemeshow Test**

The revised version shows a non-significant result, indicating that the model is a good fit.

**Observation**: The Real Statistics Logistic Regression data analysis tool automatically performs the Hosmer-Lemeshow test. For Example 1 of Finding Logistic Regression Coefficients using Solver, we can see from Figure 5 of Finding Logistic Regression Coefficients using Solver that the logistic regression model is a good fit. For Example 1, Figure 2 of Comparing Logistic Regression Models shows that the model is not a good fit, at least until we combine rows as we did above.

**Observation**: the following supplemental functions can be used to perform the Hosmer-Lemeshow test with exactly 10 equal-sized data ranges.

**Real Statistics Functions**: The Real Statistics Resource Pack provides the following two supplemental functions.

**HOSMER**(R1, *lab, raw, iter*) – returns a table with 10 equal-sized data ranges based on the data in range R1 (without headings)

**HLTEST**(R1, *lab, raw, iter*) – returns the Hosmer statistic (based on the table described above) and the p-value.

When *lab* = True then the output includes column headings and when *lab* = False (the default) only the data is outputted. When *raw* = True then the data in R1 is in raw form and when *raw* = False (the default) then the data in R1 is in summary form. The parameter *iter* determines the number of iterations used in the Newton method for calculating the logistic regression coefficients; the default value is 20.

**Observation**: We repeat Example 1 using these two functions, obtaining the results shown in Figure 3.

**Figure 3 –Hosmer-Lemeshow Test**** **

Referring to Figure 1, the output shown in range F40:K50 of Figure 3 is calculated using the formula =HOSMER(A3:D15, TRUE) and the output shown in range O40:P42 of Figure 3 is calculated using the formula =HLTEST(A3:D15, TRUE). Since the p-value > .05 (assuming *α* = .05) we conclude that the logistic regression model is a good fit.

Dear Sir,

I have calculated the HL statistic using your example. It shows that my model is not a good fit. p-value = 0.000016 and alpha = 0.05. I would like to figure out in which decile the test performs badly. Can I just calculate the p-value for each decile using the chidist funtion?

With kind regards,

Jessica

Jessica,

I am not using the true Hosmer-Lemeshow test and so there aren’t any deciles. I would look at other indicators; if they look good then I wouldn’t worry too much about the Hosmer-Lemeshow result.

Charles

Dear Sir:

I’m really curious that how could we get the p-pred value in column K figure 1?

Thank you very much.

With regards

Shirley

Shirley,

See the webpage Finding Logistic Regression Coefficients using Solver.

Charles

Sir

The HOSMER(R1, lab, raw, iter) function fails to calculate the last columns (HL-Suc and HL-Fail). I am using the 2.12 version add-in.

Colin

Colin,

That is correct. As you can see from the comments following Figure 3, the HOSMER function does not calculate these last two columns. They are easy enough to calculate, however. E.g. Cell L41 can be calculated by the formula =(H41-I41)^2/I41 and cell M41 by =(K41-J41)^2/K41.

I will consider adding these columns to the output of the function in the next release.

Charles

Sir

Thank you sir

Colin