We now show how to find the coefficients for the logistic regression model using Excel’s Solver capability (see also Goal Seeking and Solver). We start with Example 1 from Basic Concepts of Logistic Regression.
Example 1 (Example 1 from Basic Concepts of Logistic Regression continued): From Definition 1 of Basic Concepts of Logistic Regression, the predicted values pi for the probability of survival for each interval i is given by the following formula where xi represents the number of rems for interval i.
The log-likelihood statistic as defined in Definition 5 of Basic Concepts of Logistic Regression is given by
where yi is the observed value for survival in the ith interval (i.e. yi = the fraction of subjects in the ith interval that survived). Since we are aggregating the sample elements into intervals, we use the modified version of the formula, namely
where yi is the observed value of survival in the ith of r intervals and
We capture this information in the worksheet in Figure 1 (based on the data in Figure 2 of Basic Concepts of Logistic Regression).
Figure 1 – Calculation of LL based on initial guess of coefficients
Column I contains the rem values for each interval (copy of column A and E). Column J contains the observed probability of survival for each interval (copy of column F). Column K contains the values of each pi. E.g. cell K4 contains the formula =1/(1+EXP(-O5–O6*I4)) and initially has value 0.5 based on the initial guess of the coefficients a and b given in cells O5 and O6 (which we arbitrarily set to zero). Cell L14 contains the value of LL using the formula =SUM(L4:L13); where L4 contains the formula =(B4+C4)*(J4*LN(K4)+(1-J4)*LN(1-K4)), and similarly for the other cells in column L.
We now use Excel’s Solver tool by selecting Data > Analysis|Solver and filling in the dialog box that appears as described in Figure 2 (see Goal Seeking and Solver for more details).
Figure 2 – Excel Solver dialog box
Our objective is to maximize the value of LL (in cell L14) by changing the coefficients (in cells O5 and O6). It is important, however, to make sure that the Make Unconstrained Variables Non-Negative checkbox is not checked. When we click on the Solve button we get a message that Solver has successfully found a solution, i.e. it has found values for a and b which maximize LL.
We elect to keep the solution found and Solver automatically updates the worksheet from Figure 1 based on the values it found for a and b. The resulting worksheet is shown in Figure 3.
Figure 3 – Revised version of Figure 1 based on Solver’s solution
We see that a = 4.476711 and b = -0.00721. Thus the logistics regression model is given by the formula
For example, the predicted probability of survival when exposed to 380 rems of radiation is given by
Thus, the odds that a person exposed to 180 rems survives is 15.5% greater than a person exposed to 200 rems.
Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides the Logistic Regression supplemental data analysis tool. This tool takes as input a range which lists the sample data followed the number of occurrences of success and failure. E.g. for Example 1 this is the data in range A3:C13 of Figure 1. For this problem there was only one independent variable (number of rems). If additional independent variables are used then the input will contain additional columns, one for each independent variable.
We show how to use this tool to create a spreadsheet similar to the one in Figure 3. First press Ctrl-m to bring up the menu of Real Statistics data analysis tools and choose the Regression option. This in turn will bring up another dialog box. Choose the Logistic Regression option and press the OK button. This brings up the dialog box shown in Figure 4.
Figure 4 – Dialog Box for Logistic Regression data analysis tool
Now select A3:C13 as the Input Range (see Figure 5) and since this data is in summary form with column headings, select the Summary data option for the Input Format and check Headings included with data. Next select the Solver as the Analysis Type and keep the default Alpha and Classification Cutoff values of .05 and .5 respectively.
Finally press the OK button to obtain the output displayed in Figure 5.
Figure 5 – Output from Logistic Regression tool
This tool takes as input a range which lists the sample data followed the number of occurrences of success and failure (this is considered to be the summary form). E.g. for Example 1 this is the data in range A3:C13 of Figure 1 (repeated in Figure 5 in the same cells). For this problem there was only one independent variable (number of rems). If additional independent variables are used then the input will contain additional columns, one for each independent variable.
Note that the coefficients (range Q7:Q8) are set initially to zero and (cell M16) is calculated to be -526.792 (exactly as in Figure 1). The output from the Logistic Regression data analysis tool also contains many fields which will be explained later. As described in Figure 2, we can now use Excel’s Solver tool to find the logistic regression coefficient. The result is shown in Figure 6. We obtain the same values for the regression coefficients as we obtained previously in Figure 3, but also all the other cells are updated with the correct values as well.
Figure 6 – Revised output from Solver