I want to model the relationship between my y variable and time. I suspect there's a seasonal effect of summer months and an overall trend that may be captured by year. I also want to explore the interaction with the country. I'm not sure how to write his in a GAM. My thought was to have two separate interactions by time. To have one time effect I'd do the following:
gam(y~ s(nmonth,Country, bs = "fs"), data = mydata,
method = "REML")
Can I just stick on another interaction like so:
gam(y~ s(nmonth,Country, bs = "fs") + s(nyear,Country, bs = "fs"), data = mydata,
method = "REML")
Here's some sample data more for structure than applying the model:
data = structure(list(nmonth = c(12, 9, 4, 4, 3, 1, 1, 11, 9, 8, 8,
8, 8, 8, 7, 7, 5, 5, 5, 4, 3, 1, 12, 1, 7, 6, 6, 5, 12, 12, 12,
11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), nyear = c(2008, 2011,
2012, 2012, 2011, 2011, 2009, 2020, 2021, 2021, 2020, 2019, 2014,
2014, 2017, 2014, 2020, 2014, 2010, 2022, 2016, 2012, 2010, 2010,
2007, 2007, 2007, 2007, 2020, 2016, 2016, 2022, 2021, 2020, 2019,
2014, 2022, 2021, 2020, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2018, 2018, 2018, 2018, 2013, 2013, 2013, 2013, 2013, 2009, 2021,
2021, 2016, 2016, 2014, 2014, 2014, 2014, 2014, 2021, 2021, 2021,
2020, 2018, 2017, 2017, 2016, 2016, 2016, 2016, 2016, 2015, 2015,
2015, 2011, 2021, 2021, 2021, 2020, 2020, 2020, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2018, 2018, 2018),
Country = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), levels = c("England", "Ireland", "Northern Ireland",
"Scotland", "Wales"), class = "factor"), y= c(-1,
-8, 0, -1, -15, -13, 6, -39, 2, -1, 4, 1, -15, -17, -6, -9,
2, -14, -2, -2, -2, -6, 1, -4, -1, -3, 4, -11, 8, 9, -7,
2, 0, 10, -12, NA, 6, 0, -36, -7, 0, -26, -9, -6, -2, -1,
4, -4, 11, 4, 4, -2, 3, 3, 8, 9, -3, 7, 12, 7, 5, 2, 0, -2,
1, -3, -21, 2, -8, 2, 3, -1, -8, NA, -8, -20, -14, -14, -10,
-19, -37, -3, -8, -4, 3, -23, 12, -8, -14, -4, -17, -18,
-15, -9, -3, -4, -5, 5, -8, -4)), row.names = c(NA, 100L), class = "data.frame")
Assuming Country
is coded as a factor, you could do it like that, but there are several alternatives. The model you have chosen implies that the smooths of month and year respectively share the same wiggliness across all countries. The estimated shapes of the functions can vary between countries but the wiggliness of each function will be the same.
Other options would allow the wiggliness to vary by country and there are several options for that.
Standard factor-by smooths:
y ~ Country + s(nmonth, by = Country) + s(nyear, by = Country))
Ordered factor by smooths
df <- df |> transform(oCountry = ordered(Country)
contrast(df$oCountry) <- "contr.treatment"
y ~ oCountry + s(nmonth, by = oCountry) + s(nyear, by = oCountry))
These are just interactions between one continuous time variable and the Country
factor. It may well be, especially if you have a long enough record in terms of year sampled, that the seasonal pattern has varied with the long term trend. That would require a tensor product interaction between two continuous variables that is also interacting with the Country
factor. For example:
gam(y ~ Country +
s(nmonth, by = Country) +
s(nyear, by = Country) +
ti(nmonth, nyear, by = Country), ...)
would be one way to do it for the factor by models I mentioned, or if you don't need it to be decomposed like that:
gam(y ~ Country + te(nmonth, nyear, by = Country), ...)
The equivalent for the fs
smooth in your question is slightly trickier as the fs
basis only works for a univariate continuous smooth. Instead, this form of t2()
tensor product construct should be equivalent in the univariate case and provide the same kind of random surface as the random smooth in the bivariate case:
gam(y ~ t2(nmonth, nyear, Country, bs = c("cr", "cr", "re"), full = TRUE), ...)
Any you should be setting k
throughout on all smooths unless the defaults are sufficient. I didn't set k
simply for clarity.