rlinear-regressioncross-validationpolynomial-mathleave-one-out

R: GLM function error with simulated data


I have two vectors of simulated data as follows:

x = rnorm(1000, mean  = 0, sd = 1)

eps = rnorm(1000, mean = 0, sd = sqrt(0.25))

I am trying to use boot library's glm and cv.glm function to fit a linear regression model and multiple linear regression model with either leave one out cross-validation or k-fold cross-validation. The piece of code that I am using with the error I am getting is as follows:

> glm.fit=glm(y~x)
> cv.err=cv.glm(x, glm.fit)
Error in if ((K > n) || (K <= 1)) stop("'K' outside allowable range") : 
  missing value where TRUE/FALSE needed

I did check using is.na(x) and confirmed that there are no null values present. Could anyone please suggest a solution for this or point out what am I doing wrong?

Thanks in advance.


Solution

  • For glm() you can get x and y from the environment, but for cv.glm it has no access to these objects because it is running under another environment. Maybe check this post or this book chapter

    If I run your code I get the same error:

    library(boot)
    set.seed(111)
    x = rnorm(1000, mean  = 0, sd = 1)
    y = rnorm(1000, mean = 0, sd = sqrt(0.25))
    glm.fit=glm(y~x)
    cv.err=cv.glm(x, glm.fit)
    Error in if ((K > n) || (K <= 1)) stop("'K' outside allowable range") : 
      missing value where TRUE/FALSE needed
    

    If I put them into a data.frame it will work:

    da = data.frame(x=x,y=y)
    glm.fit=glm(y~x)
    cv.err=cv.glm(da, glm.fit,K=5)
    cv.err$delta
    [1] 0.2428287 0.2426424