rfixest

Why does update method does not work when estimation is wrapped in function?


Apparently the update() method cannot retrieve the dataset the estimation was based on if I wrap the estimation function in another function. Is there any way around this, e.g., by specifying an environment?

library(fixest)
data(trade)

# fit model directly and wrapped into function
mod1 <- fepois(Euros ~ log(dist_km) | Origin + Destination, trade)

fit_model <- function(df) {
  fepois(Euros ~ log(dist_km) | Origin + Destination, data = df)
}

mod2 <- fit_model(trade)

# try to update
update(mod1, . ~ . + log(Year))
#> Poisson estimation, Dep. Var.: Euros
#> Observations: 38,325 
#> Fixed-effects: Origin: 15,  Destination: 15
#> Standard-errors: Clustered (Origin) 
#>              Estimate Std. Error  t value  Pr(>|t|)    
#> log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
#> log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
#>            BIC:  2.424e+12     Squared Cor.: 0.384441
update(mod2, . ~ . + log(Year))
#> Error in fepois(fml = Euros ~ log(dist_km) + log(Year) | Origin + Destination, : Argument 'data' must be either: i) a matrix, or ii) a data.frame.
#> Problem: it is not a matrix nor a data.frame (instead it is a function).

Created on 2023-02-26 with reprex v2.0.2

Also posted as a GitHub issue.

Update: The solution seems to be forcing an early evaluation of the expression that refers to the dataset. Another way is to specify the dataset again within update():

update(mod2, . ~ . + log(Year), data = trade)

Solution

  • If you want to pass an arbitrary df into the function and not hard code trade we have to evaluate it early before calling fepois(). We can do this with eval(bquote()) and wrap the data argument (below mydat) into .(). To capture the object name nicely, we can further wrap the data argument in substitute() before evaluating it early (thanks for the comment from @jay.sf).

    Update: I now added an env argument which needs to be specified with parent.frame() when used inside purrr::map() and similar functions.

    library(fixest)
    library(tidyverse)
    data(trade)
    
    fit_model <- function(mydat, env = environment()) {
      eval(bquote(fepois(Euros ~ log(dist_km) | Origin + Destination, data = .(substitute(mydat, env = env)))))
    }
    
    mod2 <- fit_model(trade)
    
    update(mod2, . ~ . + log(Year))
    #> Poisson estimation, Dep. Var.: Euros
    #> Observations: 38,325 
    #> Fixed-effects: Origin: 15,  Destination: 15
    #> Standard-errors: Clustered (Origin) 
    #>              Estimate Std. Error  t value  Pr(>|t|)    
    #> log(dist_km) -1.51756   0.113171 -13.4095 < 2.2e-16 ***
    #> log(Year)    72.36888   6.899699  10.4887 < 2.2e-16 ***
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-Likelihood: -1.212e+12   Adj. Pseudo R2: 0.592897
    #>            BIC:  2.424e+12     Squared Cor.: 0.384441
    
    mod2$call
    #> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = trade)
    
    res <- trade |>
      nest(.by = Year) |>
      mutate(fit = map(data, \(x) fit_model(x, parent.frame())))
    
    res$fit[[1]]
    #> Poisson estimation, Dep. Var.: Euros
    #> Observations: 3,793 
    #> Fixed-effects: Origin: 15,  Destination: 15
    #> Standard-errors: Clustered (Origin) 
    #>              Estimate Std. Error  t value  Pr(>|t|)    
    #> log(dist_km) -1.48073   0.114878 -12.8896 < 2.2e-16 ***
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> Log-Likelihood: -1.082e+11   Adj. Pseudo R2: 0.573982
    #>            BIC:  2.164e+11     Squared Cor.: 0.352497
    
    res$fit[[1]]$call
    #> fepois(fml = Euros ~ log(dist_km) | Origin + Destination, data = mydat)
    

    Created on 2023-03-07 by the reprex package (v2.0.1)