I am using gam
to fit the data and I get s()
and gam()
from mgcv
. However, when I tried to type the formula:
y = data.frame(n = c(1:5),b = runif(5),c = runif(5,min = 2,max = 3),d = c(5:9),e = scale(c(11:15)),f = c(12:16))
model1 = gam(n ~ b + c + I(s(sqrt(d),bs = 'tp',df=2)) +
I(s(sqrt(e),bs = 'tp',df=2)) +
I(s(sqrt(f),bs = 'tp',df=2)),
family=quasipoisson(),
data = y,
method = 'REML',
select = TRUE)
The result was:
Error in terms.formula(reformulate(term[i])) :
invalid model formula in ExtractVars
How to type the formula in the correct way? I have already tried different methods and it turned out all failed... Please give me some suggestions. Thank you.
Edit:
I am quite sure that the error is related to the stats::formula
. I want to ask if there are any rules related to using s()
and I()
at the same time.
The session information is :
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Hmisc_4.8-0 Formula_1.2-5 survival_3.4-0 lattice_0.20-45 mgcv_1.8-41 nlme_3.1-160 dlnm_2.4.7 rvest_1.0.3 plotly_4.10.1 data.table_1.14.8 arrow_11.0.0.2
[12] magrittr_2.0.3 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.1 tidyverse_2.0.0
[23] readxl_1.4.2
loaded via a namespace (and not attached):
[1] minqa_1.2.5 colorspace_2.1-0 deldir_1.0-6 sjlabelled_1.2.0 htmlTable_2.4.1 estimability_1.4.1 snakecase_0.11.0 base64enc_0.1-3
[9] rstudioapi_0.14.0-9000 bit64_4.0.5 fansi_1.0.4 mvtnorm_1.1-3 xml2_1.3.3 splines_4.2.2 cachem_1.0.6 knitr_1.42
[17] sjmisc_2.8.9 jsonlite_1.8.4 splitstackshape_1.4.8 nloptr_2.0.3 ggeffects_1.2.1 broom_1.0.4 cluster_2.1.4 png_0.1-8
[25] clipr_0.8.0 compiler_4.2.2 httr_1.4.5 sjstats_0.18.2 emmeans_1.8.5 backports_1.4.1 assertthat_0.2.1 Matrix_1.5-1
[33] fastmap_1.1.0 lazyeval_0.2.2 cli_3.6.0 htmltools_0.5.4 tools_4.2.2 coda_0.19-4 gtable_0.3.3 glue_1.6.2
[41] Rcpp_1.0.10 santoku_0.9.0 cellranger_1.1.0 vctrs_0.5.2 sjPlot_2.8.12 insight_0.19.1 lemon_0.4.6 xfun_0.37
[49] metR_0.14.0 openxlsx_4.2.5.2 lme4_1.1-31 timechange_0.2.0 lifecycle_1.0.3 MASS_7.3-58.1 scales_1.2.1 hms_1.1.3
[57] RColorBrewer_1.1-3 curl_5.0.0 memoise_2.0.1 gridExtra_2.3 padr_0.6.2 rpart_4.1.19 latticeExtra_0.6-30 stringi_1.7.12
[65] bayestestR_0.13.0 checkmate_2.1.0 boot_1.3-28.1 zip_2.2.2 repr_1.1.6 rlang_1.1.0 pkgconfig_2.0.3 tsModel_0.6-1
[73] htmlwidgets_1.6.2 bit_4.0.5 tidyselect_1.2.0 plyr_1.8.8 R6_2.5.1 generics_0.1.3 foreign_0.8-83 pillar_1.9.0
[81] withr_2.5.0 nnet_7.3-18 datawizard_0.7.1 performance_0.10.2 janitor_2.2.0 modelr_0.1.11 crayon_1.5.2 interp_1.1-3
[89] utf8_1.2.3 tzdb_0.3.0 extraInserts_0.0.0.9003 viridis_0.6.2 jpeg_0.1-10 grid_4.2.2 digest_0.6.31 xtable_1.8-4
[97] ggeasy_0.1.4 munsell_0.5.0 viridisLite_0.4.1 skimr_2.1.5
Short answer: You don't need I()
here, and if you did it would probably be inside the s()
call.
The error arises because one of the parts of your model formula is invalid.
> model1 <- gam(n ~ b + c + I(s(sqrt(d),bs = 'tp',df=2)) +
+ I(s(sqrt(e),bs = 'tp',df=2)) +
+ I(s(sqrt(f),bs = 'tp',df=2)),
+ family=quasipoisson(),
+ data = y,
+ method = 'REML',
+ select = TRUE)
Error in terms.formula(reformulate(term[i])) :
invalid model formula in ExtractVars
Looking at the traceback we see:
8. terms.formula(reformulate(term[i]))
7. terms(reformulate(term[i]))
6. s(sqrt(d), bs = "tp", df = 2) at <text>#1
5. eval(parse(text = terms[i]), enclos = p.env, envir = mgcvns)
The s()
call is where the problem lies.
Trying that independently to see if we get a different error:
> s(e,bs = 'tp',df=2)
Error in terms.formula(reformulate(term[i])) :
invalid model formula in ExtractVars
Still a problem. Let's check the documentation for mgcv::s()
s(..., k=-1,fx=FALSE,bs="tp",m=NA,by=NA,xt=NULL,id=NULL,sp=NULL,pc=NULL)
It doesn't take a degrees of freedom argument. remove the df =
arguments and drop the I()
.
> model1 <- gam(n ~ b + c + s(sqrt(d),bs = 'tp') +
+ s(sqrt(e),bs = 'tp') +
+ s(sqrt(f),bs = 'tp'),
+ family=quasipoisson(),
+ data = y,
+ method = 'REML',
+ select = TRUE)
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) :
A term has fewer unique covariate combinations than specified maximum degrees of freedom
In addition: Warning message:
In sqrt(e) : NaNs produced
It runs, albeit the dataset is too small to make a good model.