I'm unsure why I do not get standard errors or confidence intervals when I fit parametric survival models and compute the predicted mean survival times using the avg_predictions().
I tried using different parametric distributions as well out of curiosity and that didn't change anything.
I run the following code:
library(survival)
library(flexsurv)
library(marginaleffects)
data(lung)
# fit flexible parametric survival model
fmodel1 <- list(
"Weibull" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "weibull"),
"Gamma" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gamma"),
"Gengamma" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gengamma"),
"llogis" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "llogis"),
"llnorm" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "lognormal"),
"gompertz" = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "gompertz")
)
# using marginaleffects with flexsurv
avg_predictions(fmodel1$Weibull, variables = "sex")
Below is the result I get:
sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
1 331 NA NA NA NA NA NA
2 490 64.5 7.6 <0.001 44.9 363 616
Columns: sex, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
Type: response
I'm unsure why this produces NAs for all uncertainty estimates for the reference level of the sex
variable.
This has to do with how the Delta Method is used to compute standard errors. This is too complicated to explain in a SO answer, but you can find details in this vignette:
https://marginaleffects.com/vignettes/uncertainty.html
The gist of it is that standard errors are obtained by first creating a matrix of numeric derivatives of the predictions with respect to the coefficients. Think of it as a matrix with information on “how much do my predictions change when one coefficient changes by a small amount?”
In your case, there is only one coefficient estimate in the model and one column in the design matrix:
library(survival)
library(flexsurv)
library(marginaleffects)
data(cancer, package = "survival")
mod = flexsurvreg(Surv(time, status) ~ factor(sex), data = lung, dist = "weibull")
coef(mod)
> shape scale factor(sex)2
> 0.280921 5.884162 0.395578
head(model.matrix(mod))
> factor(sex)2
> 1 0
> 2 0
> 3 0
> 4 0
> 5 0
> 6 0
When we perturb the factor(sex)2
coefficient to see how predictions change, we see that predictions do change for units where sex
is 2, but do not change when sex
is 1. Since there's never any change for those units, there's no standard error.
By the way, in this simple model it makes no difference, but I strongly encourage you to read the documentation in ?predictions
for the difference between variables
and by
. It’s kind of nuanced but my guess is that most people will want this:
avg_predictions(mod, by = "sex")
>
> sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
> 1 331 NA NA NA NA NA NA
> 2 491 62.7 7.83 <0.001 47.6 368 614
>
> Columns: sex, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
> Type: response