In ordinary linear regression, our goal is to find the equation for a straight line y =* bx + a *which best fits the data (*x*_{1}, y_{1}), …, (*x _{n}*, y

_{n}). This results in values ŷ

*=*

_{i}*bx*+

_{i}*a*. The approach is to select values for

*a*and

*b*which minimize the following

As we can see from Figure 1, this minimizes the sum of the distances squared (i.e. *e*^{2}) only in the y direction.

**Figure 1 – Distance between a point and a line**

The actual distance is actually shorter, as shown by *d* in Figure 1. Here (, ) is the point on the line y = *bx + a* that is closest to (*x*_{0}, y_{0}). Note that

In **total least squares regression**, (aka **orthogonal linear regression**) we find the values of *a* and *b* that minimize the sum of the squared Euclidean distances from the points to the regression line (i.e. the *d*^{2}). It turns out that this is equivalent to minimizing:

The values of *a* and *b* that minimize this expression are given by

where

and *x̄* and ȳ are the means of the *x _{i}* and y

*values respectively.*

_{i}**Example 1**: Repeat Example 1 of Least Squares using total least squares regression (the data are replicated in Figure 2).

The calculations are shown in Figure 2.

**Figure 2 – Total Least Squares Regression**

We see that the regression line based on total least squares is y = -0.83705*x* + 89.77211. This is as compared to the ordinary linear regression line y = -0.6282*x* + 85.72042.

In Figure 3, we graph the ordinary regression line (in blue) from Example 1 versus the regression line based on total least squares (in red).

**Figure 3 – TLS (red) vs. OLS (blue)**

**Real Statistics Function**: For array or range R1 containing *x* values and R2 containing y values, we have the following array functions.

**TRegCoeff0**(R1, R2, *lab*) = 2 × 1 column array consisting of the intercept and slope coefficients based on total linear regression using the data in R1 and R2.

If *lab* = TRUE (default FALSE), then an extra column is appended to the output from TRegCoeff containing the labels “intercept” and “slope”.

For Example 1, the output from =TRegCoeff0(A4:A18,B4:B18) is the same as shown in range E11:E12 of Figure 2.

How did you manage to plot figure 3?

Did you make use of the Real Statistics Resource Pack that is available on your website?

Regards

Jaco,

No, I didn’t need to use the Real Statistics Resource Pack to create the plot, but I did use it to find the Total Least Squares regression coefficients.

The plot is simply two scatter plots superimposed, including linear trendlines. The first scatter plot is for the data in columns A and B, while the second is for the data in columns Q and R.

Charles

How to evaluate the goodness of the fit using Total Least Square Method? Is it in the same way with that of OLS, like R^2.

You can analyze residuals just as you do for OLS. You can also calculate R^2, using the sum of the squared

Euclideandistances.Charles