Apparently the update()
method cannot retrieve the dataset the estimation was based on if I wrap the estimation function in another function. Is there any way around this, e.g., by specifying an environment?
library(fixest)
data(trade)
# fit model directly and wrapped into function
mod1 <- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)
fit_model <- function(df) {
fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}
mod2 <- fit_model(trade)
# try to update
update(mod1, . ~ . + log(Year))
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
#> log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
#> BIC: 2.424e+12 Squared Cor.: 0.384441
update(mod2, . ~ . + log(Year))
#> Error in fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : Argument 'data' must be either: i) a matrix, or ii) a data.frame.
#> Problem: it is not a matrix nor a data.frame (instead it is a function).
Created on 2023-02-26 with reprex v2.0.2
Also posted as a GitHub issue.
Update: The solution seems to be forcing an early evaluation of the expression that refers to the dataset. Another way is to specify the dataset again within update()
:
update(mod2, . ~ . + log(Year), data = trade)
If you want to pass an arbitrary df
into the function and not hard code trade
we have to evaluate it early before calling fepois()
. We can do this with eval(bquote())
and wrap the data argument (below mydat
) into .()
. To capture the object name nicely, we can further wrap the data argument in substitute()
before evaluating it early (thanks for the comment from @jay.sf).
Update: I now added an env
argument which needs to be specified with parent.frame()
when used inside purrr::map()
and similar functions.
library(fixest)
library(tidyverse)
data(trade)
fit_model <- function(mydat, env = environment()) {
eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env)))))
}
mod2 <- fit_model(trade)
update(mod2, . ~ . + log(Year))
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.51756 0.113171 -13.4095 < 2.2e-16 ***
#> log(Year) 72.36888 6.899699 10.4887 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.212e+12 Adj. Pseudo R2: 0.592897
#> BIC: 2.424e+12 Squared Cor.: 0.384441
mod2$call
#> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
res <- trade |>
nest(.by = Year) |>
mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
res$fit[[1]]
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 3,793
#> Fixed-effects: Origin: 15, Destination: 15
#> Standard-errors: Clustered (Origin)
#> Estimate Std. Error t value Pr(>|t|)
#> log(dist_km) -1.48073 0.114878 -12.8896 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.082e+11 Adj. Pseudo R2: 0.573982
#> BIC: 2.164e+11 Squared Cor.: 0.352497
res$fit[[1]]$call
#> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)
Created on 2023-03-07 by the reprex package (v2.0.1)