Autocorrelation

A key assumption in regression is that the error terms are independent of each other. In this section we present a simple test to determine whether there is autocorrelation (aka serial correlation), i.e. where there is a (linear) correlation between the error term for one observation and the next. This is especially relevant with time series data where the data are sequenced by time.

The Durbin-Watson test uses the following statistic:

image7158

where the ei = yi – ŷi are the residuals, n = the number elements in the sample and k = the number of independent variables.

d takes on values between 0 and 4. A value of d = 2 means there is no autocorrelation. A value substantially below 2 (and especially a value less than 1) means that the data is positively autocorrelated, i.e. on average a data element is close to the subsequent data element. A value of d substantially above 2 means that the data is negatively autocorrelated, i.e. on average a data element is far from the subsequent data element.

Example 1: Find the Durbin-Watson statistic for the data in Figure 1.

Durbin-Watson autocorrelation test

Figure 1 – Durbin-Watson Test

The  statistic (cell J3) is 0.725951, but what does this tell us about the autocorrelation?

Hypothesis Testing

The Durbin-Watson statistic can also be tested for significance using the Durbin-Watson Table. For each value of alpha (.01 or .05) and each value of the sample size n (from 6 to 200) and each value of the number of independent variables k (from 1 to 20), the table contains a lower and upper critical value (dL and dU).

Since most regression problems involving time series data show a positive autocorrelation, we usually test the null hypothesis H0: the autocorrelation ρ ≤ 0 (which we believe is ρ = 0) versus the alternative hypothesis H1: ρ >  0, using the following criteria:

If d < dL reject H0 : ρ ≤ 0 (and so accept H1 : ρ > 0)

If d > dU do not reject H0 : ρ ≤ 0 (presumably ρ = 0)

If dL < d < dU test is inconclusive

Note that if d > 2 then we should test for negative autocorrelation instead of positive autocorrelation. To do this simply test 4 – d for positive autocorrelation as described above.

For Example 1, with α = .05, we know that n = 11 and k = 2. From the Durbin-Watson Table, we see that dL = .75798 and dU = 1.60439. Since d = 0.72595 < .75798 = dL, we reject the null hypothesis, and conclude that there is a significant positive autocorrelation.

Real Statistics Capabilities

Real Statistics Function: The following two versions of the DURBIN function are available in the Real Statistics Resource Pack.

DURBIN(R1) = the Durbin-Watson statistic d where R1 is a column vector containing residuals

DURBIN(R1, R2) = the Durbin-Watson statistic d where R1 is a m × n range containing X data and R2 is an m × 1 column vector containing Y data.

DLowerCRIT(n, k, α, h) = lower critical value of the Durbin-Watson statistic  for samples of size n (6 to 2,000) based on k independent variables (1 to 20) for α = .01, .025 or .05 (default). If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.

DUpperCRIT(n, k, α, h) = upper critical value of the Durbin-Watson statistic  for samples of size n (6 to 2,000) based on k independent variables (1 to 20) for α = .01, .025 or .05 (default). If h = TRUE (default) harmonic interpolation is used; otherwise linear interpolation is used.

Actually the DURBIN function is an array function, described as follows:

DURBIN(R1, R2, lab, α): returns a column range with the values d, dL, dU and sig where R1 is a m × n range containing X data and R2 is an m × 1 column vector containing Y data,

DURBIN (R1, k, lab, α): returns a column range with the values d, dL, dU and sig where R1 is a column vector containing residuals and k = the # of independent variables (default = 2)

Here α = .01, .025 or .05 (default). If lab = TRUE (default = FALSE) then an extra column of labels is added to the output.

Note that the functions DLowerCRIT and DUpperCRIT support a much larger range of values of n than the Durbin-Watson Table. Also these functions support α = .01, .025 and .05, while the table only provides values for α = .01 and .05.

Observation: Referring to Figure 1, we can calculate the statistic  = 0.72595 using either one of the formulas: = DURBIN(G4:G14) or =DURBIN(B4:C14,D4:D14). In fact, if we highlight the range I3:J6 and enter either of these formulas and then press Ctrl-Shft-Enter the result will be the same as shown in range I3:J6 of Figure 1.

Real Statistics Data Analysis Tool: The Linear Regression data analysis tool provided by the Real Statistics Resource Pack also supports the Durbin-Watson Test as described next.

To conduct the test in Example 1, press Ctrl-m and double click on the Linear Regression data analysis tool. Now fill in the dialog box that appears as shown in Figure 2.

Durbin-Watson data analysis

Figure 2 – Durbin-Watson data analysis

The output is similar to that generated by the formula

=DURBIN(B4:C14,D4:D14,TRUE,O24)

49 Responses to Autocorrelation

  1. David says:

    Thanks for your reply Dr. Zaiontz.
    I know my data pretty well. I think one variable is at least causing the behavior of the other. No one believes this.
    I chose the word “interaction” because I wondered if I’m actually observing a sort of bi-variate push me – pull me scenario. Think low d statistic, very high correlation, linear relationship.
    Let me know what you think and thank you again.
    David

  2. Dave says:

    Hello Dr. Zaiontz
    Thank you for your informative web site.
    I have two monthly time series that I have updated continuously since early 2011.
    The r-squared of the two series > .98
    The scatterplot looks fairly evenly distributed about the OLS line.
    The Durbin Watson statistic hovers about 1.03 for sample sizes of 49 or 66, depending on how many periods are in the regression.
    My simple question is this – could this indication of autocorrelation actually be indicating interaction between these variables?
    Thank you again.
    Dave

    • Charles says:

      Dave,
      What do you mean by “interaction between the variables”?
      Given the high correlation between the variables, if one demonstrates autocorrelation, I am not surprised if the other does too.
      Charles

  3. Elyas says:

    Dear Charles,
    I have three time series of Soil water data from three different Hillslopes. I am checking whether there are any significant differences among the means of soil water data collected from these three hillslopes (the hillslopes are the treatments here). They were measured in two-day interval for a one-year period. I checked those methods to see if there is any serial correlation within each group of observation. According to DW test, and plotting the residuals and also PACF showed that there is serial correlation. Serial correlation of lag one was to be around 0.8-0.9 for each data set.
    Before going to the next step and running ANOVA test to check the variances and means of the three data sets, I need to remove the serial correlation. The AR (1) model which normally used to account for serial correlation in regression analysis did not work in my case. I also tried the sub-sampling technique which did not work either.
    I hope you can help me with a simple method to account for serial coorelation when running ANOVA and comparing the means for the three treatments.

    Thank you so much,
    Elyas

    • Charles says:

      Elyas,
      This seems to be some sort of repeated measures ANOVA. Correlation between time elements is common. You need to account for sphericity in the ANOVA model or use MANOVA.
      Charles

  4. Luca says:

    Charles,

    If my DW-stat = 1.977, my k’=3 and n=10, what would be the conclusion and how you would draw the DW chart?

    Thanks,

    Luca

    • Luca says:

      5% level of significance as well. I am sorry

    • Charles says:

      Luca,
      Based on the DW Table, dL = .525 and dU = 2.016. Since d = 1.977 is between these two values, the test is inconclusive (i.e. no conclusion can be made one way or the other).
      Charles

  5. Hector a. Queved says:

    I want to know if the terms autocorrelation and durbin-watson are statistics that apply exclusively when time is involved, that is to time series analysis? If there is no time involved, can you still involve autocorrelation and the durbin watson statistics? Please elucidate me on this issue.

    • Charles says:

      Hector,
      Autocorrection (and Durbin-Watson) can occur with non-time series data. This issue is more likely to occur with time series data.
      Charles

      • Hector A. Quevedo says:

        Thanks Charles, so kind of you to help people in questions about statistics. If you would permit me, I am going expand a little bit more on my research. You see there are established tables of atmospheric standards as density vs temperature used in aeronautics. It happens that I came across with the table that gives the density as function of temperature and when you apply experimental design, the residual graphics (the residual fits and orders) are highly correlated, and the D-W statistics value was of .25, even though the R2, s, PRESS, etc. seemed to be OK. So what I did, was to manually correct the variation using the graph of the ln density vs. temperature and positioning the points manually right in the regression line. By running again the regression model, all the objective and subjective responses improved dramatically. If my logic is right, the use of the standard atmospheric values for aeronautical purposes would be questionable.
        Thank you very much for your valuable attention to my questions and comments.
        Best regards
        Hector A. Quevedo (Ph.D.)

  6. syikin says:

    what can we do if autocorrelation exist, teacher?

    • Charles says:

      Syikin,
      I will adding some information about what to do when there is autocorrelation shortly.
      Charles

  7. Weidson C. de Souza says:

    Hi Charles…

    Thank you for the greetings and also for support.

    I had already noticed this option on the dialog box of the multiple regression rotine, but when I performed this routine I did not recognize the p-value on outputs. I need of the p-value (numerical value between 0 – 1) associated Durbin Watson statistic. Is it possible to calculate the p-value by some procedure/function for I to compare with significance level (5%)? Congratulations for your Add-in !!!

  8. Weidson C. de Souza says:

    Hi Charles,

    I am from Brasil and I’m really liking the Add-in that you have programmed. Is there any function that is possible to calculate the p-value of stastistic of durbin Watson? If No, is possible development?

    I’m would like apologize by my words.

    best regards,

    Weidson

    • Charles says:

      Weidson,
      Greetings to you in Brasil. To access the Durbin-Watson data analysis tool, choose Regression from the main menu. Then choose Multiple Linear Regression on the dialog box that appears. Finally choose the Durbin-Watson option on the subsequent dialog box.
      Charles

      • Weidson says:

        Hello dr. Charles

        Could you help me with my doubt bellow?

        Thanks vary much.

        Weidson

        • Charles says:

          Weidson,

          I currently only test for first order autocorrelation using a table of critical values for Durbin-Watson’s d statistic. Sorry, but I don’t yet calculate a p-value.

          For large samples (of size n), you can use the p-value of the normal distribution based on the fact that Durbin-Watson’s d is approximately normally distributed with mean 2 and variance 4/n.

          Charles

  9. William Agurto says:

    Charles:

    Are DURBIN formula and LEVERAGE formula in Real Statistics sensible to the number of observations or the number of independent variables?
    I suppose that because when I use Real Statistics Data Analysis Tool 4.1 (Multiple Regression option) for 34 observations, 4 independent variables and 1 dependent variable, I obtain correct results. But for another case, 15157 observations, 49 independent variables and 1 dependent variable I obtain two errors:
    1. #VALUE! for all the data in Durbin-Watson Table .
    2. #VALUE! for all the data in Leverage column (Cook’s D Table), and all the data in the columns related to Leverage (Mod MSE, RStudent, T-Test, Cook’s D, DFFITS).
    3. The rest of the results seems to be OK.
    Is it related to the matrix operations been executed by Excel when matrices are so big? Or is it possible that my data is forming bad-conditioned matrices? If so, ¿Why errors are present only in results related to DURBIN and LEVERAGE formulas, but not in the rest of the regression tables?

    Thank you.

    William Agurto.

    • William Agurto says:

      Charles:

      It seems that my data is OK (15157 observations, 49 independent variables and 1 dependent variable): when I used Factor Analysis in Real Statistics Data Analysis Tool I have not gotten errors. So, the problema is only present in DURBIN and LEVERAGE formulas.

      Thank you.

      William Agurto.

    • Charles says:

      William,
      Can you send me and Excel file with your data so that I can try to figure out what the problem is?
      Charles

      • William Agurto says:

        Charles:

        I sent the Excel file with the data to your both e-mail accounts:
        czaiontz@gmail.com
        info@real-statistics.com
        I hope you can find the problem. It seems like my data is OK.

        Please, confirm when you receive the file (more than 10 Mb).

        I will be waiting for your answer.

        Thank you.

        William Agurto.

        • Charles says:

          William,
          I have received your email.
          Charles

          • William Agurto says:

            Charles:

            I received your e-mail. I answered that today: the data that I sent you has no missing values. The character “-” is a zero value. It is seen because of the Excel format for the variable X5 (I used that format because of the magnitud of that variable). If you change the format of that column (to “general” format, for example) you can see that “-” is really a zero value.
            Please, review the data again to verify the functionality of DURBIN and LEVERAGE formulas.

            Thank you.

            William Agurto.

          • Charles says:

            William,

            Yes despite seeing a “-” I see that the value is a zero.

            In any case, the problem with DURBIN is that the values for n and k exceed the size of the values in the Durbin-Watson table. I am going to explore using a normal approximation in this case.

            The problem with LEVERAGE is that the number of data items exceeds 2178. In this case when the hat matrix is evaluated it looks like the size becomes too large. For the next release I plan to perform the calculation in a different way to avoid this problem.

            Thanks for finding these problems. Stay tuned.

            Charles

  10. waqas says:

    how to calculate power of durbin watson test?
    is there any mathematical structure of power for durbin watson test?

  11. umang says:

    Hi,

    I am trying to run a regression analysis in which i have 50 time periods, one dep variable and 4 independent variables. I havr conducted data transformations. The minimum I got is dw 1.0. Is this acceptable? My objective is not to forecast but to find the contributing variable.

    • Charles says:

      If n = 50, k = 4 and alpha = .05, then the DW bounds are 1.206 and 1.537. I am not sure what you mean by the minimum value of DW, but if DW is 1.0, then since 1.0 < 1.206 you need to reject the null hypothesis that rho <= 0. Charles

  12. tarra says:

    Dear Charles ,

    I wanna know what is Lower critical value dL and Upper critical value dU.
    I’ve been found it, but I can’t find it still.
    Please let me know its equation.

    *I’m not good at English. If I had wrong grammer, please understand me, Thankyou 🙂

    Thanks.
    Tarra

  13. anusha says:

    Hi Charles,

    Mentioned K as # of dependent variables and after the example K as # of Independent variables. Could you please check and clarify where K used in this test.

    Thanks
    Anusha

    • Charles says:

      Hi Anusha,
      k = # of independent variables. Thanks for identifying this typo. I have corrected the referenced webpage.
      k is used twice in the test. It is used to calculate the predicted y values and it is used in the Durbin-Watson table of critcial values.
      Charles

  14. JeanMarc says:

    Thank you very much for your presentation.

    Nevertheless, I need further information about the Dl identifcation.

    Reporting to the Durbin Watson table (for alpha 0.05, n=11, k=2) the tests value are .0.519 / 1.297.

    However, the calculated test shows the 0.75798 value.

    Where did I have missed something ?

    Thank you for your reply

    • Charles says:

      JeanMarc,
      Since the test statistic is between the two critical values, the test is inconclusive.
      I am not sure I know what you mean by “Dl identification”. Is this simply the results of the Durbin Watson test?
      Charles

  15. Kai says:

    So, if you are able to generate the Durbin-Watson stat, how do you use Excel to generate a significance for that D stat from a given α, n and k? Aside from looking it up in tables someone else figured out already…

    • Charles says:

      Kai,
      Significance testing for Durbin-Watson is not included in the latest release of the Real Statistics software. I plan to add this shortly, probably in the next release.
      Charles

    • Charles says:

      Kai,
      I have now included the Durbin-Watson table on the website. In the next release of the software (due out in the next few days) I will provide a function that gives the critical values for sample sizes up to 5,000 elements and up to 20 independent variables. I will also provide a function that carries out the significance test.
      Charles

      • Kai says:

        Thanks, I’m looking forward to seeing it!

        Significance testing of the Durbin-Watson stat seems to be missing many programs/add-Ins…your offerings will be uncommonly complete there!

        I’d like to fill in some gaps in something I’m working on in Python too, so I’m curious – what is the math formula you’re using find the cummulative significance values for D?

        • Charles says:

          Kai,
          Significance testing for Durbin-Watson is available in Release 3.3 or Rel 3.3.1 of the software. The testing is done using a table lookup.
          Charles

  16. Charles says:

    Luan,
    Your teacher is correct. While I can think of situations where serial correlation could be used for non-time-series data, in practice it is used with time-series data.
    Charles

  17. Luan says:

    Dear Mr. Charles,

    My teacher said that Durbin Watson test can only be used for time series data, not for cross sectional data.
    When I read this post I very wonder if my teacher was seriously wrong. Can you help me explain this more details ?

    Best Regards,
    Luan

  18. Phu Hai says:

    Thanks Charles! I just love the ease with which your resource pack can be utilized.

    Best Wishes;
    Hai

  19. Arvind B S says:

    Thanks Charles! Efforts much appreciated – I just love the ease with which your resource pack can be utilized.

    Best Regards,
    Arvind.

  20. Rich says:

    Thanks, Charles
    The two options make things very convenient.
    Rich

Leave a Reply

Your email address will not be published. Required fields are marked *