I am trying to run a for loop over pairs of columns for a zero-one-inflated beta regression. I've run similar loops for other models (e.g glm or betareg) but in this case it doesn't recognise the column title.
dat <- data.frame(y = rbeta(10, 0.7, 1.5),
x = sample(c(0,1), 10, replace=TRUE),
a1 = abs(rnorm(10)),
a2 = abs(rnorm(10)),
a3 = abs(rnorm(10)),
b1 = abs(rnorm(10)),
b2 = abs(rnorm(10)),
b3 = abs(rnorm(10)))
Using the above random data I've tried to run the following loop
library(brms)
for (i in 1:3){
model <- brm(
formula = bf(
y | weights(paste0("a",i)) ~ x,
phi ~ x + paste0("b",i),
zoi ~ x + paste0("b",i),
coi ~ x + paste0("b",i),
family = zero_one_inflated_beta()
),
data = dat
)
mod <- summary(model)
e[i] <- mod$fixed["x1", "Estimate"]
}
This gives the following: Error: The following variables can neither be found in 'data' nor in 'data2':'i'
If there were only one set of columns I would run something like the below to generate a list, but I don't know how to introduce the second set of columns in the ? position.
data %>% reframe(across(a1:a3,\(a) {
list(summary(brm(
formula = bf(
y | weights(a) ~ x,
phi ~ x + ?,
zoi ~ x + ?,
coi ~ x + ?,
family = zero_one_inflated_beta()
),
data = dat
)))
}))
I only want to run the pairs so a1 and b1, a2 and b2 etc so running a double across which would do all the bs with each a would be very inefficient.
Any ideas why the loop won't run or suggestions for another approach would be great.
The general rule is that you should use as.formula()
or formula()
, or in this case reformulate()
(which isn't really different but I find more aesthetic) to construct your formulas.
First a helper function, since we will be repeating the same process with minor variations:
ffun <- function(i, response) {
reformulate(c("x", paste0("b", i)), response = response)
}
Using this in your context:
library(brms)
for (i in 1:3) {
model <- brm(
formula = bf(
reformulate("x", response = sprintf("y | weights(a%d)", i)),
ffun(i, "phi"),
ffun(i, "zoi"),
ffun(i, "coi"),
family = zero_one_inflated_beta()
),
data = dat
)
}
To speed this process up by avoiding recompilation for every new model fit (which can be a majority of the effort if the data set is small), you can use update()
; you need to fit the base model fully on one set of variables, then you can use update()
on the rest. (I have left out the components for storing the results; you can also substitute in @Edward's machinery for updating the formula for mine, if you prefer that approach)
i <- 1
model1 <- brm(
formula = bf(
reformulate("x", response = sprintf("y | weights(a%d)", i)),
ffun(i, "phi"),
ffun(i, "zoi"),
ffun(i, "coi"),
family = zero_one_inflated_beta()
),
data = dat
)
for (i in 2:3) {
newform <- bf(
reformulate("x", response = sprintf("y | weights(a%d)", i)),
ffun(i, "phi"),
ffun(i, "zoi"),
ffun(i, "coi"),
family = zero_one_inflated_beta()
)
m_new <- update(model1, newform, newdata = dat)
}