Autocorrelation Function

Definition 1: The autocorrelation function (ACF) at lag k, denoted ρk, of a stationary stochastic process is defined as ρk = γk/γ0 where γk = cov(yi, yi+k) for any i.

Note that γ0 is the variance of the stochastic process.

Definition 2: The mean  of a time series y1, …, yn is

image019z

The autocovariance function at lag k, for k ≥ 0, of the time series is defined by

image020z

The autocorrelation function (ACF) at lag k, for k ≥ 0, of the time series is defined by

image021z

The variance of the time series is r0. A plot of rk against k is known as a correlogram.

Observation: The definition of autocovariance given above is a little different from the usual  definition of covariance between {y1, …, yn-k} and {yk+1, …, yn} in two respects: (1) we divide by n instead of n–k and we subtract the overall mean instead of the means of {y1, …, yn-k} and {yk+1, …, yn} respectively. For values of n which are large with respect to k, the difference will be small.

Example 1: Calculate s2 and r2 for the data in range B4:B19 of Figure 1.

Autocorrelation function ACF Excel

Figure 1 – ACF at lag 2

The formulas for calculating s2 and r2 using the usual COVARIANCE.S and CORREL functions are shown in cells G4 and G5.

The formulas for s0, s2 and r2 from Definition 2 are shown in cells G8, G11 and G12 (along with an alternative formula in G13). Note that the values for s2 in cells E4 and E11 are not too different, as are the values for r2 shown in cells E5 and E12; the larger the sample the more likely these values will be similar

Real Statistics Function: The Real Statistics Resource Pack supplies the following functions:

ACF(R1, k) = the ACF value at lag k for the time series in range R1

ACVF(R1, k) = the autcovariance at lag k for the time series in range R1

Note that ACF(R1, k) is equivalent to

=SUMPRODUCT(OFFSET(R1,0,0,COUNT(R1)-k)-AVERAGE(R1),OFFSET(R1,k,0,COUNT(R1)-k)-AVERAGE(R1))/DEVSQ(R1)

Observation: There are theoretical advantages for using division by n instead of n–k in the definition of sk, namely that the covariance and correlation matrices will always be definite non-negative (see Positive Definite Matrices).

Observation: Even though the definition of autocorrelation is slightly different from that of correlation, ρk (or rk) still takes a value between -1 and 1, as we see in Property 2.

Property 1: For any stationary process,  γ0 ≥ |γi| for any i

Proof: Click here

Property 2: For any stationary process, |ρi| ≤ 1 (i.e. -1 ≤ ρi ≤ 1) for any i > 0

Proof: By Property 1, γ0 ≥ |γi| for any i. Since ρi = γi /γ0 and γ≥ 0 (actually γ> 0 since we are assuming that ρi is well-defined), it follows that

image027z

Example 2: Determine the ACF for lag = 1 to 10 for the Dow Jones closing averages for the month of October 2015, as shown in columns A and B of Figure 2 and construct the corresponding correlogram.

The results are shown in Figure 2. The values in column E are computed by placing the formula =ACF(B$4:B$25, D5) in cell E5, highlighting range E5:E14 and pressing Ctrl-D.

ACF Correlogram

Figure 2 – ACF and Correlogram

As can be seen from the values in column E or the chart, the ACF values descend slowly towards zero. This is typical of an autoregressive process.

Observation: A rule of thumb is to carry out the above process for lag = 1 to n/3 or n/4, which for the above data is 22/4 ≈ 6 or 22/3 ≈ 7. Our goal is to see whether by this time the ACF is significant (i.e. statistically different from zero). We can do this by using the following property.

Property 3 (Bartlett): In large samples, if a time series of size n is purely random then for all k

image028z

Example 3: Determine whether the ACF at lag 7 is significant for the data from Example 2.

As we can see from Figure 3, the critical value for the test in Property 3 is .417866. Since r7 = .303809 < .417866, we conclude that  is not significantly different from zero.

Bartlett's test autocorrelation

Figure 3 – Bartlett’s Test

Note that values of k up to 5 are significant and those higher than 5 are not significant.

Property 4 (Box-Pierce): In large samples, if ρk = 0 for all k ≤ m, then

image029z

A more statistically powerful version of Property 4, especially for smaller samples, is given by the next property.

Property 5 (Ljung-Box): If ρk = 0 for all k ≤ m, then

image030z

Example 4: Use the Box-Pierce and Ljung-Box statistics to determine whether the ACF values in Example 2 are statistically equal to zero for all lags less than or equal to 5 (the null hypothesis).

The results are shown in Figure 4.

Box-Pierce Ljung-Box tests

Figure 4 – Box-Pierce and Ljung-Box Tests

We see from these tests that ACF(k) is significantly different from zero for at least one k ≤ 5, which is consistent with the correlogram in Figure 2.

Real Statistics Functions: The Real Statistics Resource Pack provides the following functions to perform the tests described by the above properties.

BARTEST(r, n, lag) = p-value of Bartlett’s test for correlation coefficient r based on a time series of size n for the specified lag.

BARTEST(R1,, lag) = BARTEST(r, n, lag) where n = the number of elements in range R1 and r = ACF(R1,lag)

PIERCE(R1,,lag) = Box-Pierce statistic Q for range R1 and the specified lag

BPTEST(R1,,lag) = p-value for the Box-Pierce test for range R1 and the specified lag

LJUNG(R1,,lag) = Ljung-Box statistic Q for range R1 and the specified lag

LBTEST(R1,,lag) = p-value for the Ljung-Box test for range R1 and the specified lag

In the above functions where the second argument is missing, the test is performed using the autocorrelation coefficient (ACF). If the value assigned instead is 1 or “pacf” then the test is performed using the partial autocorrelation coefficient (PACF) as described in the next section. Actually if the second argument takes any value except 1 or “pacf”, then the ACF value is used.

E.g. BARTEST(.303809,22,7) = .07708 for Example 3 and LBTEST(B4:B25,”acf”,5) = 1.81E-06 for Example 4.