In ordinary linear regression, our goal is to find the equation for a straight line y =* bx + a *which best fits the data (*x*_{1}, y_{1}), …, (*x _{n}*, y

_{n}). This results in values ŷ

*=*

_{i}*bx*+

_{i}*a*. The approach is to select values for

*a*and

*b*which minimize the following

As we can see from Figure 1, this minimizes the sum of the distances squared (i.e. *e*^{2}) only in the y direction.

**Figure 1 – Distance between a point and a line**

The actual distance is actually shorter, as shown by *d* in Figure 1. Here (, ) is the point on the line y = *bx + a* that is closest to (*x*_{0}, y_{0}). Note that

In **total least squares regression**, (aka **orthogonal linear regression**) we find the values of *a* and *b* that minimize the sum of the squared Euclidean distances from the points to the regression line (i.e. the *d*^{2}). It turns out that this is equivalent to minimizing:

The values of *a* and *b* that minimize this expression are given by

where

and *x̄* and ȳ are the means of the *x _{i}* and y

*values respectively.*

_{i}**Example 1**: Repeat Example 1 of Least Squares using total least squares regression (the data are replicated in Figure 2).

The calculations are shown in Figure 2.

**Figure 2 – Total Least Squares Regression**

We see that the regression line based on total least squares is y = -0.83705*x* + 89.77211. This is as compared to the ordinary linear regression line y = -0.6282*x* + 85.72042.

In Figure 3, we graph the ordinary regression line (in blue) from Example 1 versus the regression line based on total least squares (in red).

**Figure 3 – TLS (red) vs. OLS (blue)**

**Real Statistics Function**: For array or range R1 containing *x* values and R2 containing y values, we have the following array functions.

**TRegCoeff0**(R1, R2, *lab*) = 2 × 1 column array consisting of the intercept and slope coefficients based on total linear regression using the data in R1 and R2.

If *lab* = TRUE (default FALSE), then an extra column is appended to the output from TRegCoeff containing the labels “intercept” and “slope”.

For Example 1, the output from =TRegCoeff0(A4:A18,B4:B18) is the same as shown in range E11:E12 of Figure 2.