Total Least Squares

In ordinary linear regression, our goal is to find the equation for a straight line y = bx + a which best fits the data (x1, y1), …, (xn, yn). This results in values ŷi = bxi + a. The approach is to select values for a and b which minimize the following

Least squares

As we can see from Figure 1, this minimizes the sum of the distances squared (i.e. e2) only in the y direction.

Distance between line point

Figure 1 – Distance between a point and a line

The actual distance is actually shorter, as shown by d in Figure 1. Here (\hat x, \tilde y) is the point on the line y = bx + a that is closest to (x0, y0). Note that

perpendicular point

In total least squares regression, (aka orthogonal linear regression) we find the values of a and b that minimize the sum of the squared Euclidean distances from the points to the regression line (i.e. the d2). It turns out that this is equivalent to minimizing:

Euclidean distance squared

The values of a and b that minimize this expression are given by

w formula

where

calculation of w

and  and ȳ are the means of the xi and yi values respectively.

Example 1: Repeat Example 1 of Least Squares using total least squares regression (the data are replicated in Figure 2).

The calculations are shown in Figure 2.

Total least squares regression

Figure 2 – Total Least Squares Regression

We see that the regression line based on total least squares is y = -0.83705x + 89.77211. This is as compared to the ordinary linear regression line y = -0.6282x + 85.72042.

In Figure 3, we graph the ordinary regression line (in blue) from Example 1 versus the regression line based on total least squares (in red).

TLS vs. OLS

Figure 3 – TLS (red) vs. OLS (blue)

Real Statistics Function: For array or range R1 containing x values and R2 containing y values, we have the following array functions.

TRegCoeff0(R1, R2, lab) = 2 × 1 column array consisting of the intercept and slope coefficients based on total linear regression using the data in R1 and R2.

If lab = TRUE (default FALSE), then an extra column is appended to the output from TRegCoeff containing the labels “intercept” and “slope”.

For Example 1, the output from =TRegCoeff0(A4:A18,B4:B18) is the same as shown in range E11:E12 of Figure 2.