In a simple linear regression model, the predicted dependent variable is modeled as a linear function of the independent variable plus a random error term
A first-order autoregressive process, denoted AR(1), takes the form
Thinking of the subscripts i as representing time, we see that the value of y at time i+1 is a linear function of y at time i plus a fixed constant and a random error term. Similar to the ordinary linear regression model, we assume that the error terms are independently distributed based on a normal distribution with zero mean and a constant variance σ2 and that the error terms are independent of the y values. Thus
It turns out that such a process is stationary when |φ1| < 1, and so we will make this assumption as well. Note that if |φ1| = 1 we have a random walk.
Similarly, a second-order autoregressive process, denoted AR(2), takes the form
and a p-order autoregressive process, AR(p), takes the form
Property 1: The mean of the yi in a stationary AR(p) process is
Proof: click here
Property 2: The variance of the yi in a stationary AR(1) process is
Proof: click here
Property 3: The lag h autocorrelation in a stationary AR(1) process is
Proof: click here
Example 1: Simulate a sample of 100 elements from the AR(1) process
where εi ∼ N(0,1) and calculate ACF.
Thus φ0 = 5, φ1 = .4 and σ = 1. We simulate the independent εi by using the Excel formula =NORM.INV(RAND(),0,1) or =NORM.S.INV(RAND()) in column B of Figure 1 (only the first 20 of 100 values are displayed.
The value of y1 is calculated by placing the formula =5+0.5*0+B4 in cell C4 (i.e. we arbitrarily assign the value zero to y0). The other yi values are calculated by placing the formula =5 +0.5*C4+B5 in cell C5, highlighting the range C5:C203 and pressing Ctrl-D.
By Property 1 and 2, the theoretical values for the mean and variance are μ = φ0/(1–φ1) = 5/(1–.4) = 8.33 (cell F22) and
(cell F23). These compare to the actual time series values of ȳ =AVERAGE(C4:C103) = 8.23 (cell I22) and s2 = VAR.S(C4:C103) = 1.70 (cell I23).
The time series ACF values are shown for lags 1 through 15 in column F. These are calculated from the y values as in Example 1. Note that the ACF value at lag 1 is .394376. Based on Property 3, the population ACF value at lag 1 is ρ1 = φ1 = .4. Theoretically, the values for ρh = = .4h should get smaller and smaller as h increases (as shown in column G of Figure 1).
Figure 1 – Simulated AR(1) process
The graph of the y values is shown on the left of Figure 2. As you can see, no particular pattern in visible. The graph of ACF for the first 15 lags is shown on the right side of Figure 2. As you can see, the actual and theoretical values for the first two lags agree, but after that the ACF values are small but not particularly consistent.
Figure 2 – Graphs of simulated AR(1) process and ACF
Observation: Based on Property 3, for 0 < φ1 < 1, the theoretical values of ACF converge to 0. If φ1 is negative, -1 < φ1 < 0, then the theoretical values of ACF also converge to 0, but alternate in sign between positive and negative.
Property 4 : For any stationary AR(p) process. The autocovariance at lag k > 0 can be calculated as
Similarly the autocorrelation at lag k > 0 can be calculated as
Here we assume that γh = γ-h and ρh = ρ-h if h < 0, and ρ0 = 1.
These are known as the Yule-Walker equations.
Proof: click here
Property 5: The Yule-Walker equations also hold where k = 0 provided we add a σ2 term to the sum. This is equivalent to
Observation: In the AR(1) case, we have




We can also calculate the variance as follows:
This value can be re-expressed algebraically as described in Property 7 below.
Property 6: The following hold for a stationary AR(2) process
Proof: Follows from Property 4, as shown above.
Property 7: The variance of the yi in a stationary AR(2) process is
Proof: click here for an alternative proof.



























