rfor-looptime-seriessimulationautoregressive-models

Using a for-loop to simulate an AR(1) process over a linear trend


I tried to simulate an AR(1) process using a for loop. Starting values should be 0 and the process should have a defined constant (= 3), a random error and a time trend (= t).

t = 1:1000
x = rep(0,1000)
b = rnorm(1000)
for (i in 1:length(t)) {
  x[i] = 3 + 0.01 * t[i] + 0.4 * x[i-1] + b
}

When I run this code, somehow x is NULL and I can not see why. Any idea on how to fix it?


Solution

  • I don't get NULL; I get "replacement" error. There are 2 issues at code level:

    1. loop counter should start from i = 2, because you are to access x[i - 1] in the loop;

    2. the error is b[i], not b.

    t = 1:1000
    x = rep(0, 1000)
    b = rnorm(1000)
    for (i in 2:length(t)){
      x[i] = 3 + 0.01 * t[i] + 0.4 * x[i - 1] + b[i]
    }
    

    In addition, at model level, I wonder if you should really do:

    t = 1:1000
    x = rep(0, 1000)
    b = rnorm(1000)
    ## AR(1)
    for (i in 2:length(t)){
      x[i] = 0.4 * x[i - 1] + b[i]
    }
    ## linear trend
    x <- 3 + 0.01 * t + x
    

    In this way, the auto-correlation is not messed up with the linear trend.

    I highly encourage you to understand how a loop works. On one hand, this basic syntax is useful. On the other hand, it is the canonical way to express math algorithms. But once you are comfortable with loops, you are encouraged to use arima.sim to simulate AR(1).


    Finally, I want to point out something that usually causes great confusion.

    set.seed(0)  ## for reproducibility
    t = 1:1000
    x = rep(0, 1000)
    b = rnorm(1000)
    ## AR(1)
    for (i in 2:length(t)){
      x[i] = 0.4 * x[i - 1] + b[i]
    }
    ## linear trend
    x <- 3 + 0.01 * t + x
    

    The AR(1) coefficient is 0.4, so if we remove the linear trend, the estimated lag-1 auto-correlation should be around 0.4, right? Now this is what causes the confusion.

    In time series analysis, people are taught to remove linear trend using diff(). Now if you do this, the estimated lag-1 auto-correlation is far off 0.4 and is even negative!!

    dx <- diff(x)
    acf(dx)
    

    acf

    Here is the problem. diff is much stronger than you think. It is able to remove piecewise random linear trend (the characteristic of a Brownian motion). When we have a simple, global, deterministic linear trend, diff tends to overwork and introduce negative auto-correlation. The correct way to estimate this lag-1 auto-correlation is:

    fit <- lm(x ~ t)
    acf(fit$residuals)
    

    correct acf