The goal
I want to run dominance analysis on a Dirichlet regression, to approximate the relative importance of a set of predictors (scaled continuous predictors, continuous predictors with splines, and factors). Dirichlet regression is an extension of beta regression to model proportions that are not derived from counts, and that are split between more than 2 categories, see Douma&weedon (2019).
The modelling approach: the syntax is potentially important
I am using the DirichletReg
package to fit a Dirichlet regression, with an "alternative"
parametrization: this allows to estimate simultaneously the parameters and the precision of the estimation. The syntax is: response ~ parameters | precision
. The estimation of parameters can be done with different predictors from those used to estimate precision: response ~ predictor1 + predictor2 | predictor3
. If left undeclared, the model assumes fixed precision: response ~ predictors
, which can be declared explicilty as: response ~ predictors | 1
.
I think that the error is related to the vertical bar in the formula, which separates the predictors used to estimate parameters from the predictors used to estimate precision.
I rely on performance::r2()
to calculate a metric of model quality: Nagelkerke's pseudo-R2. However, for the actual analysis, I am thinking of either McFadden or Estrella's pseudo-R2, as they appear suitable to run dominance analysis on multinomial responses, see Luchman 2014.
The obstacle
I get the error message: "fitstat requires at least two elements".
A reproducible example
From data available in the DirichletReg
package. The response is only two categories, but in any case, it yields the same error message as in the actual analysis.
library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3
# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)
# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")
summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#>
#> Standardized Residuals:
#> Min 1Q Median 3Q Max
#> 1 - accuracy -1.5279 -0.7798 -0.343 0.6992 2.4213
#> accuracy -2.4213 -0.6992 0.343 0.7798 1.5279
#>
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 2.22386 0.28087 7.918 2.42e-15 ***
#> dyslexiayes -1.81261 0.29696 -6.104 1.04e-09 ***
#> iq -0.02676 0.06900 -0.388 0.698
#> ------------------------------------------------------------------
#>
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.71017 0.32697 5.230 1.69e-07 ***
#> dyslexiayes 2.47521 0.55055 4.496 6.93e-06 ***
#> iq 0.04097 0.27537 0.149 0.882
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758
# Run dominance analysis: error
# If left undeclared, the model assumes fixed precision: parameters | 1
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | 1,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
domir::domin(acc ~ dyslexia + iq,
reg = function(y) DirichletReg::DirichReg(y, data = RS, model = "alternative"),
fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
consmodel = "| dyslexia + iq"
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1 DirichletReg_0.7-1 Formula_1.2-4
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 insight_0.19.1
#> [5] lattice_0.20-44 rlang_1.1.0 fastmap_1.1.0 stringr_1.5.0
#> [9] highr_0.9 tools_4.1.0 grid_4.1.0 xfun_0.30
#> [13] cli_3.6.0 withr_2.5.0 htmltools_0.5.2 maxLik_1.5-2
#> [17] miscTools_0.6-28 yaml_2.3.5 digest_0.6.29 lifecycle_1.0.3
#> [21] vctrs_0.6.1 fs_1.5.2 glue_1.6.2 evaluate_0.15
#> [25] rmarkdown_2.13 sandwich_3.0-1 reprex_2.0.1 stringi_1.7.6
#> [29] compiler_4.1.0 generics_0.1.2 zoo_1.8-9
Created on 2023-07-27 by the reprex package (v2.0.1)
References
Interestingly, the solution provided on July 2023 no longer worked in February 2024. Building on the code provided by Joseph Luchman, I could make domir
compatible with a Dirichlet model, by fitting a Dirichlet model outside domir
, and using update
inside the domir
call to pass the formula.
This was also discussed over at GitHub. Infinite thanks to J. Luchman for the persistance and assistance!
library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3
RS <- ReadingSkills
response.d <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
# Fit Dirichlet regression
rs2 <- DirichReg(response.d ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")
performance::r2( rs2)[[1]]
#> Nagelkerke's R2
#> 0.4590758
domir::domir(response.d ~ dyslexia + iq,
function(y) {
iv <- attr(terms(y), "term.labels")
fml <- paste0("response.d ~ ", paste0(iv, collapse = "+"), "| dyslexia + iq", collapse = "")
print(fml)
performance::r2( update(rs2, formula(fml) ) )[[1]]})
#> [1] "response.d ~ dyslexia+iq| dyslexia + iq"
#> [1] "response.d ~ dyslexia| dyslexia + iq"
#> [1] "response.d ~ iq| dyslexia + iq"
#> Overall Value: 0.4590758
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> dyslexia 0.455042413 0.99121418 1
#> iq 0.004033357 0.00878582 2
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2
#> dyslexia 0.457263643 0.452821183
#> iq 0.006254587 0.001812127
#>
#> Complete Dominance Designations:
#> Dmnated?dyslexia Dmnated?iq
#> Dmnates?dyslexia NA TRUE
#> Dmnates?iq FALSE NA
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1 DirichletReg_0.7-1 Formula_1.2-4
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 insight_0.19.1
#> [5] lattice_0.20-44 rlang_1.1.0 fastmap_1.1.0 stringr_1.5.0
#> [9] highr_0.9 tools_4.1.0 grid_4.1.0 xfun_0.39
#> [13] cli_3.6.0 withr_2.5.2 htmltools_0.5.5 maxLik_1.5-2
#> [17] miscTools_0.6-28 yaml_2.3.5 digest_0.6.29 lifecycle_1.0.4
#> [21] vctrs_0.6.1 fs_1.5.2 glue_1.6.2 evaluate_0.15
#> [25] rmarkdown_2.13 sandwich_3.0-1 reprex_2.0.1 stringi_1.7.6
#> [29] compiler_4.1.0 generics_0.1.2 zoo_1.8-9
Created on 2024-02-26 by the reprex package (v2.0.1)