**Theorem 1**: The best fit line for the points (*x*_{1}, y_{1}), …, (*x _{n}*, y

*) is given by*

_{n}Proof: Our objective is to minimize

For any given values of (*x*_{1}, y_{1}), … (*x _{n}*, y

*), this expression can be viewed as a function of*

_{n}*b*and

*c*. Calling this function

*g*(

*b, c*), by calculus the minimum value occurs when the partial derivatives are zero.

Transposing terms and simplifying,

Since = 0, from the second equation we have *c* = ȳ, and* *from the first equation we have

The result follows since

Alternative Proof: This proof doesn’t require any calculus. We first prove the theorem for the case where both *x *and y have mean 0 and standard deviation 1. Assume the best fit line is y = *bx + a*, and so

for all *i*. Our goal is to minimize the following quantity

Now minimizing *z* is equivalent to minimizing *z*/*n*, which is

since *x̄* = ȳ = 0. Now since *a*^{2} is non-negative, the minimum value is achieved when *a* = 0. Since we are considering the case where *x* and y have standard deviation of 1, , and so expanding the above expression further we get

Now suppose *b = r – e*, then the above expression becomes

Now since *e*^{2} is non-negative, the minimum value is achieved when* e* = 0. Thus *b = r – e* = *r.* This proves that the best fitting line has the form y = *bx + a* where* b = r* and* a* = 0, i.e. y = *rx*.

We now consider the general case where the *x* and y don’t necessarily have mean of 0 and standard deviation of 1, and set

*x*′ = (*x* – *x̄*)/*s _{x}* and y′ = (y – ȳ)/

*s*

_{y}

Now *x*′ and y′ do have mean of 0 and standard deviation of 1, and so the line that best fits the data is y′ = *rx*′, where *r* = the correlation coefficient between *x*′ and y′. Thus the best fit line has form

or equivalently

where *b = rs*_{y}/*s _{x}.* Now note that by Property B of Correlation, the correlation coefficient for

*x*and y is the same as that for

*x*′ and y′, namely

*r*.

The result now follows by Property 1. If there is a better fit line for *x* and y, it would produce a better fit line for *x*′ and y′, which would be a contradiction.

Thank you Charles, I was looking everywhere for this Derivation!

Hi Charles,

Googling for a good answer on how to calculate the confidence limits of a linear regression I found your text. It is useful indeed.

Andreas