In ordinary linear regression, our goal is to find the equation for a straight line y = bx + a which best fits the data (x1, y1), …, (xn, yn). This results in values ŷi = bxi + a. The approach is to select values for a and b which minimize the following
As we can see from Figure 1, this minimizes the sum of the distances squared (i.e. e2) only in the y direction.
Figure 1 – Distance between a point and a line
The actual distance is actually shorter, as shown by d in Figure 1. Here (, ) is the point on the line y = bx + a that is closest to (x0, y0). Note that
In total least squares regression, (aka orthogonal linear regression) we find the values of a and b that minimize the sum of the squared Euclidean distances from the points to the regression line (i.e. the d2). It turns out that this is equivalent to minimizing:
The value of b that minimizes this expression is given by
and x̄ and ȳ are the means of the xi and yi values respectively. The intercept can now be expressed asExample 1: Repeat Example 1 of Least Squares using total least squares regression (the data are replicated in Figure 2).
The calculations are shown in Figure 2.
Figure 2 – Total Least Squares Regression
We see that the regression line based on total least squares is y = -0.83705x + 89.77211. This is as compared to the ordinary linear regression line y = -0.6282x + 85.72042.
In Figure 3, we graph the ordinary regression line (in blue) from Example 1 versus the regression line based on total least squares (in red).
Figure 3 – TLS (red) vs. OLS (blue)
Real Statistics Function: For array or range R1 containing x values and R2 containing y values, we have the following array functions.
TRegCoeff0(R1, R2, lab) = 2 × 1 column array consisting of the intercept and slope coefficients based on total linear regression using the data in R1 and R2.
If lab = TRUE (default FALSE), then an extra column is appended to the output from TRegCoeff containing the labels “intercept” and “slope”.
For Example 1, the output from =TRegCoeff0(A4:A18,B4:B18) is the same as shown in range E11:E12 of Figure 2.
Caution: The version of the TRegCoeff0 function in Rel 5.4.2 and earlier releases has a bug that will be corrected in the next release. In the meantime, please use the formula TRegCoeff(R1, R2, lab), which will give the correct result.