Using Newton’s Method with Summary Data
Before turning our attention back to Example 1 of Basic Concepts of Logistic Regression, we first give some useful background.
Click here for a proof of Property 1, which uses calculus.
Observation: Thus, to find the values of the coordinates bi we need to solve the equations
Property 2: Let B = [bj] be the (k+1) × 1 column vector of logistic regression coefficients, let Y = [yi] be the n × 1 column vector of observed outcomes of the dependent variable, let X be the n × (k+1) design matrix (see Definition 3 of Least Squares Method for Multiple Regression), let P = [pi] be the n × 1 column vector of predicted values of success and V = [vij] be the n × n diagonal matrix where vii = pi (1 – pi) on the main diagonal and zeros elsewhere. Then if B0 is an initial guess of B and for all m we define the following iteration
Click here for a proof of Property 2, which uses calculus.
Observation: If we group the data as we did in Example 1 of Basic Concepts of Logistic Regression (i.e. summary data), then Property 1 takes the form
Property 2 also holds where Y = [yi] is the n × 1 column vector of summarized observed outcomes of the dependent variable, X is the corresponding n × (k+1) design matrix, P = [pi] is the n × 1 column vector of predicted values of success and V = [vij] is the n × n matrix where vii = ni pi (1 – pi) on the main diagonal and vij = 0 when i ≠ j.
We apply Newton’s method to find the coefficients as described in Figure 1. The method converges in only 4 iterations with the values a = 4.47665 and b = -0.0072.
The regression equation is therefore logit(p) = 4.47665 – 0.0072x.
We can get the same result using the Logistic Regression data analysis tool as described in Finding Logistic Regression Coefficients using Solver, except that this time we check the Using Newton method option in the Logistic Regression dialog box (see Figure 4 of Finding Logistic Regression Coefficients using Solver or Figure 3 below).
Using Newton’s Method with Raw Data
Example 2: A study was made as to whether environmental temperature or immersion in water of the hatching egg had an effect on the gender of a particular type of small reptile. The table in Figure 2 shows the temperature (in degrees Celsius) and immersion in water (0 = no and 1 = yes) of the 49 eggs which resulted in a live birth as well as the sex of the reptile that hatched. Determine the odds that a female will be born if the temperature is 23 degrees with the egg immersed in water vs. not immersed in water.
We use the Logistic Regression data analysis tool, selecting the Raw data and Newton Method options as shown in Figure 3.
After pressing the OK button we obtain the output displayed in Figure 4.
Figure 4 – Output from Logistic Regression data analysis tool
Here we only show the first 19 elements in the sample, although the full sample is contained in range A4:C52. Note that in the raw data option the Input Range (range A4:C52) consists of one column for each independent variable (Temp and Water for this example) and a final column only containing the values 0 or 1, where 1 indicates “success” (Male in this case) and 0 indicates “failure” (Female in this case). Please don’t read any gender discrimination into these choices: we would get the same result if we chose Female to be success and Male to be failure.
The model indicates that to predict the probability that a reptile will be male you can use the following formula:
Here we copied the formula from cell K6 into cells G29 and G30. The formula that now appears in cell G29 will be =1/(1+EXP(-$R$7-MMULT(A29:B29,$R$8:$R$9))). You just need to change the part A29:B29 to E29:F29 (where the values of Temp and Water actually appear). The resulting formula
will give the result shown in Figure 5.
In Real Statistics Functions for Logistic Regression we show an easier way of finding the predicted values.
Observation: The approach described above for performing logistic regression with input in the form of raw data works well for up to 65,500 rows of data. When the input data contains more than 65,500 rows, you can still use the Logistic Regression data analysis tool, but you need to uncheck the Show summary in output option (see Figure 3).
See Real Statistics Functions for Logistic Regression for how to perform logistic regression including summaries when there are more than 65,500 rows of raw data.