Method of Least Squares Detailed

Theorem 1: The best fit line for the points (x1, y1), …, (xn, yn) is given by

image1673 where
image1674

Proof: Our objective is to minimize

image3427

For any given values of (x1, y1), … (xn, yn), this expression can be viewed as a function of b and c. Calling this function g(b, c), by calculus the minimum value occurs when the partial derivatives are zero.

image3429 image3430

Transposing terms and simplifying,

image3431 image3432Since  \sum\nolimits_{i=1}^n (x_i-\bar{x}) = 0, from the second equation we have c = ȳ, and from the first equation we have

image3435

The result follows since

image3436

Alternative Proof: This proof doesn’t require any calculus. We first prove the theorem for the case where both and y have mean 0 and standard deviation 1. Assume the best fit line is y = bx + a, and so

image3437

for all i. Our goal is to minimize the following quantity

image3438

Now minimizing z is equivalent to minimizing z/n, which is

image3440 image3441 image3442 image3443

since = ȳ = 0. Now since a2 is non-negative, the minimum value is achieved when a = 0. Since we are considering the case where x and y have standard deviation of 1, s_x^2 = s_y^2 = 1, and so expanding the above expression further we get

image3446 image3447 image3448

since
image3450

Now suppose b = r – e, then the above expression becomes

image3452 image3453

Now since e2 is non-negative, the minimum value is achieved when e = 0. Thus b = r – e = r. This proves that the best fitting line has the form y = bx + a where b = r and a = 0, i.e. y = rx.

We now consider the general case where the x and y don’t necessarily have mean of 0 and standard deviation of 1, and set

x′ = (x)/sx and y′ = (y – ȳ)/sy

Now x′ and y′ do have mean of 0 and standard deviation of 1, and so the line that best fits the data is y′ = rx′, where r = the correlation coefficient between x′ and y′. Thus the best fit line has form

image5080

or equivalently

image1673

where b = rsy/sx. Now note that by Property B of Correlation, the correlation coefficient for x and y is the same as that for x′ and y′, namely r.

The result now follows by Property 1. If there is a better fit line for x and y, it would produce a better fit line for x′ and y′, which would be a contradiction.

2 Responses to Method of Least Squares Detailed

  1. Thank you Charles, I was looking everywhere for this Derivation!

  2. Andreas Engel says:

    Hi Charles,
    Googling for a good answer on how to calculate the confidence limits of a linear regression I found your text. It is useful indeed.
    Andreas

Leave a Reply

Your email address will not be published. Required fields are marked *