rmissing-dataemmeansforcats

Bug report when emmeans() is used along with fct_na_value_to_level()


This is an MRE that shows an inconsistency in the use of emmeans() along with fct_na_value_to_level(). It wasn't easy to get the why of the error from my initial code ;)

I prefer to put it here instead on GitHub because I don't know which package can fix the behavior.

What do you think? Thank you

# I was dealing with na values in some factors and I decided to use the fct_na_value_to_level() function
warpbreaks2 <- warpbreaks |> 
  dplyr::mutate(
    tension_na_value = forcats::fct_na_level_to_value(tension, "L"), # just to simulate some missing values in my datatset
    tension_na_level = forcats::fct_na_value_to_level(tension_na_value, "missing"), # I want to make the missing values explicit for my later model
    tension_na_level2 = forcats::fct_na_value_to_level(tension_na_value) # this is another try, without renaming the new explicit level
  )

# For your information
levels(warpbreaks2$tension_na_level)
#> [1] "M"       "H"       "missing"
levels(warpbreaks2$tension_na_level2)
#> [1] "M" "H" NA

# First try : everything is ok
lm(breaks ~ wool * tension_na_level, data = warpbreaks2) |> 
  emmeans::emmeans (~ wool | tension_na_level)
#> tension_na_level = M:
#>  wool emmean   SE df lower.CL upper.CL
#>  A      24.0 3.65 48     16.7     31.3
#>  B      28.8 3.65 48     21.4     36.1
#> 
#> tension_na_level = H:
#>  wool emmean   SE df lower.CL upper.CL
#>  A      24.6 3.65 48     17.2     31.9
#>  B      18.8 3.65 48     11.4     26.1
#> 
#> tension_na_level = missing:
#>  wool emmean   SE df lower.CL upper.CL
#>  A      44.6 3.65 48     37.2     51.9
#>  B      28.2 3.65 48     20.9     35.6
#> 
#> Confidence level used: 0.95

# Second try throws an error. the behaviour of emmeans() is not consistant, but maybe it's an issue with how fct_na_value_to_level() is written?
lm(breaks ~ wool * tension_na_level2, data = warpbreaks2) |> 
  emmeans::emmeans (~ wool | tension_na_level2)
#> Error in X[, nm, drop = FALSE]: indice hors limites

Created on 2024-07-08 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Manjaro Linux
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    fr_FR.UTF-8
#>  tz       Indian/Reunion
#>  date     2024-07-08
#>  pandoc   3.1.11 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version  date (UTC) lib source
#>  cli            3.6.3    2024-06-21 [1] CRAN (R 4.4.0)
#>  coda           0.19-4.1 2024-01-31 [1] CRAN (R 4.4.0)
#>  codetools      0.2-20   2024-03-31 [2] CRAN (R 4.4.1)
#>  digest         0.6.36   2024-06-23 [1] CRAN (R 4.4.0)
#>  dplyr          1.1.4    2023-11-17 [1] CRAN (R 4.4.0)
#>  emmeans        1.10.3   2024-07-01 [1] CRAN (R 4.4.1)
#>  estimability   1.5.1    2024-05-12 [1] CRAN (R 4.4.0)
#>  evaluate       0.24.0   2024-06-10 [1] CRAN (R 4.4.0)
#>  fansi          1.0.6    2023-12-08 [1] CRAN (R 4.4.0)
#>  fastmap        1.2.0    2024-05-15 [1] CRAN (R 4.4.0)
#>  forcats        1.0.0    2023-01-29 [1] CRAN (R 4.4.0)
#>  fs             1.6.4    2024-04-25 [1] CRAN (R 4.4.0)
#>  generics       0.1.3    2022-07-05 [1] CRAN (R 4.4.0)
#>  glue           1.7.0    2024-01-09 [1] CRAN (R 4.4.0)
#>  htmltools      0.5.8.1  2024-04-04 [1] CRAN (R 4.4.0)
#>  knitr          1.48     2024-07-07 [1] CRAN (R 4.4.1)
#>  lattice        0.22-6   2024-03-20 [2] CRAN (R 4.4.1)
#>  lifecycle      1.0.4    2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.4.0)
#>  MASS           7.3-61   2024-06-13 [1] CRAN (R 4.4.0)
#>  Matrix         1.7-0    2024-04-26 [1] CRAN (R 4.4.0)
#>  multcomp       1.4-25   2023-06-20 [1] CRAN (R 4.4.0)
#>  mvtnorm        1.2-5    2024-05-21 [1] CRAN (R 4.4.0)
#>  pillar         1.9.0    2023-03-22 [1] CRAN (R 4.4.0)
#>  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.4.0)
#>  R6             2.5.1    2021-08-19 [1] CRAN (R 4.4.0)
#>  reprex         2.1.1    2024-07-06 [1] CRAN (R 4.4.1)
#>  rlang          1.1.4    2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown      2.27     2024-05-17 [1] CRAN (R 4.4.0)
#>  rstudioapi     0.16.0   2024-03-24 [1] CRAN (R 4.4.0)
#>  sandwich       3.1-0    2023-12-11 [1] CRAN (R 4.4.0)
#>  sessioninfo    1.2.2    2021-12-06 [1] CRAN (R 4.4.0)
#>  survival       3.6-4    2024-04-24 [2] CRAN (R 4.4.1)
#>  TH.data        1.1-2    2023-04-17 [1] CRAN (R 4.4.0)
#>  tibble         3.2.1    2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyselect     1.2.1    2024-03-11 [1] CRAN (R 4.4.0)
#>  utf8           1.2.4    2023-10-22 [1] CRAN (R 4.4.0)
#>  vctrs          0.6.5    2023-12-01 [1] CRAN (R 4.4.0)
#>  withr          3.0.0    2024-01-16 [1] CRAN (R 4.4.0)
#>  xfun           0.45     2024-06-16 [1] CRAN (R 4.4.0)
#>  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml           2.3.9    2024-07-05 [1] CRAN (R 4.4.1)
#>  zoo            1.8-12   2023-04-13 [1] CRAN (R 4.4.0)
#> 
#>  [1] /home/annadoizy/R/x86_64-pc-linux-gnu-library/4.4
#>  [2] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Solution

  • The bug is now solved, see https://github.com/rvlenth/emmeans/issues/500 for details. The error will be no more in the next emmeans release. Thanks to @russ