I want to apply broom::tidy()
to models nested in a fixest_multi
object and extract the names of each list level as data frame columns. Here's an example of what I mean.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
This command estimates two models for each dep. var. (Ozone
and Solar.R
) for a subset of each Month
plus the full sample. Here's how the resulting object looks like:
> names(multiple_est)
[1] "Full sample" "5" "6" "7" "8" "9"
> names(multiple_est$`Full sample`)
[1] "Ozone" "Solar.R"
I now want to tidy
each model object, but keep the information of the Month
/ Dep.var.
combination as columns in the tidied data frame. My desired output would look something like this:
I can run map_dfr
from the tidyr
package, giving me this result:
> map_dfr(multiple_est, tidy, .id ="Month") %>% head(9)
# A tibble: 9 x 6
Month term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Full sample (Intercept) -71.0 23.6 -3.01 3.20e- 3
2 Full sample Wind -3.06 0.663 -4.61 1.08e- 5
3 Full sample Temp 1.84 0.250 7.36 3.15e-11
4 5 (Intercept) -76.4 82.0 -0.931 3.53e- 1
5 5 Wind 2.21 2.31 0.958 3.40e- 1
6 5 Temp 3.07 0.878 3.50 6.15e- 4
7 6 (Intercept) -70.6 46.8 -1.51 1.45e- 1
8 6 Wind -1.34 1.13 -1.18 2.50e- 1
9 6 Temp 1.64 0.609 2.70 1.29e- 2
But this tidies only the first model of each Month
, the model with the Ozone
outcome.
My desired output would look something like this:
Month outcome term estimate more columns from tidy
Full sample Ozone (Intercept) -71.0
Full sample Ozone Wind -3.06
Full sample Ozone Temp 1.84
Full sample Solar.R (Intercept) some value
Full sample Solar.R Wind some value
Full sample Solar.R Temp some value
... rows repeated for each month 5, 6, 7, 8, 9
How can I apply tidy
to all models and add another column that indicates the outcome of the model (which is stored in the name
of the model object)?
So, fixest_mult
has a pretty strange setup as I delved deeper. As you noticed, mapping across it or using apply
just accesses part of the data frames. In fact, it isn't just the data frames for "Ozone"
, but actually just the data frames for the first 6 data frames (those for c("Full sample", "5", "6")
.
If you convert to a list, it access the data
attribute, which is a sequential list of all 12 data frames, but dropping the relevant names you're looking for. So, as a workaround, could use pmap()
and the names (found in the attributes of the object) to tidy()
and then use mutate()
for your desired columns.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
nms <- attr(multiple_est, "meta")$all_names
pmap_dfr(
list(
data = as.list(multiple_est),
month = rep(nms$sample, each = length(nms$lhs)),
outcome = rep(nms$lhs, length(nms$sample))
),
~ tidy(..1) %>%
mutate(
Month = ..2,
outcome = ..3,
.before = 1
)
)
#> # A tibble: 36 × 7
#> Month outcome term estimate std.error statistic p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Full sample Ozone (Intercept) -71.0 23.6 -3.01 3.20e- 3
#> 2 Full sample Ozone Wind -3.06 0.663 -4.61 1.08e- 5
#> 3 Full sample Ozone Temp 1.84 0.250 7.36 3.15e-11
#> 4 Full sample Solar.R (Intercept) -76.4 82.0 -0.931 3.53e- 1
#> 5 Full sample Solar.R Wind 2.21 2.31 0.958 3.40e- 1
#> 6 Full sample Solar.R Temp 3.07 0.878 3.50 6.15e- 4
#> 7 5 Ozone (Intercept) -70.6 46.8 -1.51 1.45e- 1
#> 8 5 Ozone Wind -1.34 1.13 -1.18 2.50e- 1
#> 9 5 Ozone Temp 1.64 0.609 2.70 1.29e- 2
#> 10 5 Solar.R (Intercept) -284. 262. -1.08 2.89e- 1
#> # … with 26 more rows