[SOLVED] Simulating thousands of regressions and obtaining p-values

Simulating thousands of regressions and obtaining p-values

I'm looking to do some basic simulation in R to examine the nature of p-values. My goal is to see whether large sample sizes trend towards small p-values. My thought is to generate random vectors of 1,000,000 data points, regress them on each other, and then plot the distribution of p-values and look for skew.

This is what I was thinking so far:

x1 = runif(1000000, 0, 1000) 
x2 = runif(1000000, 0, 1000) 
model1 = lm(x2~x1)

Using code taken from another thread:

lmp <- function (modelobject) {
    if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
    f <- summary(modelobject)$fstatistic
    p <- pf(f[1],f[2],f[3],lower.tail=F)
    attributes(p) <- NULL
    return(p)
  }
lmp(model1) 
0.3874139

Any suggestions for how I might do this for 1000 models or even more? Thanks!

Solution

see ?replicate ... but the p-value you're calculating assumes gaussian errors not uniform ones (not that this will matter much at n=10^6)

Specifically, something like this:

nrep <- 1000
ndat <- 1000000
results <- replicate(nrep, {
     x1=runif(ndat, 0, 1000);
     x2=runif(ndat, 0, 1000);
     model1=lm(x1 ~ x2);
     lmp(model1)
     })

should work, but it will take a long time to run.

I'd suggest making nrep and ndat smaller to try it out.