In Multiple Regression Analysis, we noted that the assumptions for the regression model can be expressed in terms of the error random variables as follows:

**Linearity**: The*ε*have mean of 0_{i}**Independence**: The*ε*are independent_{i}**Normality**: The*ε*are normally distributed_{i}**Homogeneity of variances**: The*ε*have the same variance_{i}*σ*^{2}

If these assumptions are satisfied then the random errors *ε _{i}* can be regarded as a random sample from a

*N*(0,

*σ*) distribution. It is natural, therefore, to test our assumptions for the regression model by investigating the sample observations of the residuals

It turns out that the raw residuals* e _{i}* have the distribution

where the* h _{ii}* are the terms in the diagonal of the hat matrix defined in Definition 3 of Method of Least Squares for Multiple Regression. It also turns out that raw residuals are not independent.

By Property 3b of Expectation, we know that

The* r _{i} *have the desired distribution, but they are still not independent. If, however, the

*h*are reasonably close to zero then the

_{ii}*r*can be considered to be independent. As usual,

_{i}*MS*can be used as an estimate for

_{E}*σ*.

**Definition 1**: The **studentized residuals** are defined by

**Observation**: If the *ε _{i}* have the same variance

*σ*

^{2}, then the studentized residuals have a Student’s

*t*distribution, namely

where *n* = the number elements in the sample and *k* = the number of independent variables.

We can now use the studentized residuals to test the various assumptions of the multiple regression model. In particular, we can use the various tests described in Testing for Normality and Symmetry, especially QQ plots, to test for normality, and we can use the tests found in Homogeneity of Variance to test whether the homogeneity of variance assumption is met.

It should also be noted that if the linearity and homogeneity of variances assumptions are met then a plot of the studentized residuals should show a randomized pattern. If this is not the case then one of these assumptions is not being met. This approach works quite well where there is only one independent variable. With multiple independent variables, then the plot of the residual against each independent variable will be necessary, and even then multi-dimensional issues may not be captured.

**Example 1**: Check the assumptions of regression analysis for the data in Example 1 of Method of Least Squares for Multiple Regression by using the studentized residuals.

We start by calculating the studentized residuals (see Figure 1).

**Figure 1 – Hat matrix and studentized residuals for data in Example 1**

First we calculate the hat matrix *H* (from the data in Figure 1 of Multiple Regression Analysis in Excel) by using the array formula

=MMULT(MMULT(E4:G14,E17:G19),TRANSPOSE(E4:G14))

where E4:G14 contains the design matrix *X*. Alternatively, *H* can be calculated using the supplemental function HAT(A4:B14). From *H*, the vector of studentized residuals is calculated by the array formula

=O4:O14/SQRT(O19*(1-INDEX(Q4:AA14,AB4:AB14,AB4:AB14)))

where O4:O14 contains the matrix of raw residuals *E* and O19 contains *MS _{Res}*. See Example 2 in Matrix Operations for more information about extracting the diagonal elements from a square matrix.

We now plot the studentized residuals against the predicted values of y (in cells M4:M14 of Figure 2).

**Figure 2 – Studentized residual plot for Example 1**

The values are reasonably spread out, but there does seem to be a pattern of rising value on the right, but with such a small sample it is difficult to tell. Also some of this might be due to the presence of outliers (see Outliers and Influencers).

**Real Statistics Excel Function**: The Real Statistics Resource Pack provides the following supplemental array function where R1 is an *n* × *k* array containing X sample data and R2 is an *n* × 1 array containing Y sample data.

**RegStudE**(R1, R2) = *n* × 1 vector of studentized residuals

Thus, the values in the range AC4:AC14 of Figure 1 can be generated via the array formula RegStudE(A4:B14,C4:C14), again referring to the data in Figure 3 of Method of Least Squares for Multiple Regression

Can you please explain to me what is the Raw Residual ?

Thanks in advance !

Alhanouf,

The raw residual is e_i on the referenced webpage, i.e. y_i minus the predicted value of y_i.

Charles

Hi, Charles. Thank you for your website. Above, you state: “First we calculate the hat matrix H (from the data in Figure 3 of Method of Least Squares for Multiple Regression) by using the array formula…”. There is no Figure 3 of Method of Least Squares for Multiple Regression.

Regards,

Tara

Tara,

Thanks for catching this mistake. The reference should be to Figure 1 of Multiple Regression Analysis in Excel. I have now corrected the referenced webpage.

Charles

Hi Charles,

Thanks, much appreciated.

Kind regards

Declan

Hi there

The SUMXMY2 formula is great for Durbin Watson test.

Kind regards

Declan

Declan,

I expect to add Durbin Watson test shortly. I am working on multinomial logistic regression now.

Charles

Perhaps include PRESS and Durbin-Watson statistics in a future release?

Thanks, Rich

Thanks Rich,

I will certainly consider adding these in a future release.

Charles