I'm looking to do some basic simulation in R to examine the nature of p-values. My goal is to see whether large sample sizes trend towards small p-values. My thought is to generate random vectors of 1,000,000 data points, regress them on each other, and then plot the distribution of p-values and look for skew.
This is what I was thinking so far:
x1 = runif(1000000, 0, 1000)
x2 = runif(1000000, 0, 1000)
model1 = lm(x2~x1)
Using code taken from another thread:
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
lmp(model1)
0.3874139
Any suggestions for how I might do this for 1000 models or even more? Thanks!
see ?replicate
... but the p-value you're calculating assumes gaussian errors not uniform ones (not that this will matter much at n=10^6)
Specifically, something like this:
nrep <- 1000
ndat <- 1000000
results <- replicate(nrep, {
x1=runif(ndat, 0, 1000);
x2=runif(ndat, 0, 1000);
model1=lm(x1 ~ x2);
lmp(model1)
})
should work, but it will take a long time to run.
I'd suggest making nrep and ndat smaller to try it out.