**Property 1**: The maximum of the log-likelihood statistic occurs when

Proof: Let

where the y* _{i}* are considered constants from the sample and the

*p*are defined as follows:

_{i}Here

which is the odds ratio (see Definition 3 of Basic Concepts of Logistic Regression). Now let

To make our notation simpler we will define *x*_{i0} = 1 for all *i*, and so we have

Thus

Also note that

The maximum value of ln *L* occurs where the partial derivatives are equal to 0. We first note that

Thus

The maximum of occurs when

for all *j*, completing the proof.

**Observation**: To find the values of the coefficients *b _{i} *we need to solve the equations of Property 1.

We do this iteratively using Newton’s method (see Definition 2 and Property 2 of Newton’s Method), as described in the following property.

**Property 2**: Let *B* = [*b _{j}*] be the (

*k*+1) × 1 column vector of logistic regression coefficients, let

*Y*= [y

_{i}] be the

*n*× 1 column vector of observed outcomes of the dependent variable, let

*X*be the

*n*× (

*k*+1) design matrix, let

*P*= [

*p*] be the

_{i}*n*× 1 column vector of predicted values of success and

*V*= [

*v*] be the

_{i}*n × n*matrix where

*v*(

_{i}= p_{i}*1 – p*). Then if

_{i}*B*

_{0}is an initial guess of

*B*and for all

*m*we define the following iteration

then for *m* sufficiently large *B _{m+1} ≈ B_{m}, *and so

*B*is a reasonable estimate of the coefficient vector.

_{m}Proof: Define

where *x*_{i0} = 1. We now calculate the partial derivatives of the *f _{j}*.

Let* v _{i} = p_{i }*(1 –

*p*) and using the terminology of Definition 2 of Newton’s Method, define

_{i}Now

where *X* is the design matrix (see Definition 3 of Multiple Regression Least Squares), *Y* is the column matrix with elements y_{i} and *P* is the column matrix with elements *p _{i}*. Let

*V*= the diagonal matrix with the elements

*v*on the main diagonal. Then

_{i}We can now use Newton’s method to find *B*, namely define the *k* × 1 column vectors *P _{m}* and

*B*

_{m}*and the (*

*k*+1) × (

*k*+1) square matrices

*V*and

_{m}*J*as follows based on the values of

_{m}*P*,

*F, V*and

*J*described above.

Then for sufficiently large *m*, *F*(*B _{m}*) = 0, which is equivalent to the statement of the property.

I wish there was even a simpler step by step explanation of this. I get lost with all the variable substitutions. I.e. newtons method for solving logistic regression for dummies.

Mike,

Good to see that some people are looking at the more mathematical part of the site. I agree that the proof given is a bit complicated. I will look at this again shortly and see if I can find a simpler approach.

Charles

Mike,

I have just updated this page on the website. I hope that you find the new explanation clearer.

Charles