A key assumption in regression is that the error terms are independent of each other. In this section we present a simple test to determine whether there is autocorrelation (aka serial correlation), i.e. where there is a (linear) correlation between the error term for one observation and the next. This is especially relevant with time series data where the data are sequenced by time.
The Durbin-Watson test uses the following statistic:
where the ei = yi – ŷi are the residuals, n = the number elements in the sample and k = the number of independent variables.
d takes on values between 0 and 4. A value of d = 2 means there is no autocorrelation. A value substantially below 2 (and especially a value less than 1) means that the data is positively autocorrelated, i.e. on average a data element is close to the subsequent data element. A value of d substantially above 2 means that the data is negatively autocorrelated, i.e. on average a data element is far from the subsequent data element.
Example 1: Find the Durbin-Watson statistic for the data in Figure 1.
Figure 1 – Durbin-Watson Test
The statistic (cell J3) is 0.725951, but what does this tell us about the autocorrelation?
The Durbin-Watson statistic can also be tested for significance using the Durbin-Watson Table. For each value of alpha (.01 or .05) and each value of the sample size n (from 6 to 200) and each value of the number of independent variables k (from 1 to 20), the table contains a lower and upper critical value (dL and dU).
Since most regression problems involving time series data show a positive autocorrelation, we usually test the null hypothesis H0: the autocorrelation ρ ≤ 0 (which we believe is ρ = 0) versus the alternative hypothesis H1: ρ > 0, using the following criteria:
If d < dL reject H0 : ρ ≤ 0 (and so accept H1 : ρ > 0)
If d > dU do not reject H0 : ρ ≤ 0 (presumably ρ = 0)
If dL < d < dU test is inconclusive
Note that if d > 2 then we should test for negative autocorrelation instead of positive autocorrelation. To do this simply test 4 – d for positive autocorrelation as described above.
For Example 1, with α = .05, we know that n = 11 and k = 2. From the Durbin-Watson Table, we see that dL = .75798 and dU = 1.60439. Since d = 0.72595 < .75798 = dL, we reject the null hypothesis, and conclude that there is a significant positive autocorrelation.
Real Statistics Capabilities
Real Statistics Function: The following two versions of the DURBIN function are available in the Real Statistics Resource Pack.
DURBIN(R1) = the Durbin-Watson statistic d where R1 is a column vector containing residuals
DURBIN(R1, R2) = the Durbin-Watson statistic d where R1 is a m × n range containing X data and R2 is an m × 1 column vector containing Y data.
DLowerCRIT(n, k, α, h) = lower critical value of the Durbin-Watson statistic for samples of size n (6 to 2,000) based on k independent variables (1 to 20) for α = .01, .025 or .05 (default). If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.
DUpperCRIT(n, k, α, h) = upper critical value of the Durbin-Watson statistic for samples of size n (6 to 2,000) based on k independent variables (1 to 20) for α = .01, .025 or .05 (default). If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.
Actually the DURBIN function is an array function, described as follows:
DURBIN(R1, R2, lab, α): returns a column range with the values d, dL, dU and sig where R1 is a m × n range containing X data and R2 is an m × 1 column vector containing Y data,
DURBIN (R1, k, lab, α): returns a column range with the values d, dL, dU and sig where R1 is a column vector containing residuals and k = the # of independent variables (default = 2)
Here α = .01, .025 or .05 (default). If lab = TRUE (default = FALSE) then an extra column of labels is added to the output.
Note that the functions DLowerCRIT and DUpperCRIT support a much larger range of values of n than the Durbin-Watson Table. Also these functions support α = .01, .025 and .05, while the table only provides values for α = .01 and .05.
Observation: Referring to Figure 1, we can calculate the statistic = 0.72595 using either one of the formulas: = DURBIN(G4:G14) or =DURBIN(B4:C14,D4:D14). In fact, if we highlight the range I3:J6 and enter either of these formulas and then press Ctrl-Shft-Enter the result will be the same as shown in range I3:J6 of Figure 1.
Real Statistics Data Analysis Tool: The Linear Regression data analysis tool provided by the Real Statistics Resource Pack also supports the Durbin-Watson Test as described next.
To conduct the test in Example 1, press Ctrl-m and double click on the Linear Regression data analysis tool. Now fill in the dialog box that appears as shown in Figure 2.
Figure 2 – Durbin-Watson data analysis
The output is similar to that generated by the formula