Theorem 1: The best fit line for the points (x1, y1), …, (xn, yn) is given by
Proof: Our objective is to minimize
For any given values of (x1, y1), … (xn, yn), this expression can be viewed as a function of b and c. Calling this function g(b, c), by calculus the minimum value occurs when the partial derivatives are zero.
Transposing terms and simplifying,
The result follows since
Alternative Proof: This proof doesn’t require any calculus. We first prove the theorem for the case where both x and y have mean 0 and standard deviation 1. Assume the best fit line is y = bx + a, and so
for all i. Our goal is to minimize the following quantity
Now minimizing z is equivalent to minimizing z/n, which is
since x̄ = ȳ = 0. Now since a2 is non-negative, the minimum value is achieved when a = 0. Since we are considering the case where x and y have standard deviation of 1, , and so expanding the above expression further we get
Now suppose b = r – e, then the above expression becomes
Now since e2 is non-negative, the minimum value is achieved when e = 0. Thus b = r – e = r. This proves that the best fitting line has the form y = bx + a where b = r and a = 0, i.e. y = rx.
We now consider the general case where the x and y don’t necessarily have mean of 0 and standard deviation of 1, and set
x′ = (x – x̄)/sx and y′ = (y – ȳ)/sy
Now x′ and y′ do have mean of 0 and standard deviation of 1, and so the line that best fits the data is y′ = rx′, where r = the correlation coefficient between x′ and y′. Thus the best fit line has form
where b = rsy/sx. Now note that by Property B of Correlation, the correlation coefficient for x and y is the same as that for x′ and y′, namely r.
The result now follows by Property 1. If there is a better fit line for x and y, it would produce a better fit line for x′ and y′, which would be a contradiction.