I have survival data with competing risk (censored 0, relaps 1, death 2) and I want to use Grey's test to check the difference between groups of a variable rh (categories 0, 1 and 2). This is my data:
> surv_data
# A tibble: 344 × 4
ID time_to_event status rh
<dbl> <dbl> <dbl> <fct>
1 1 61.2 0 PHLF 0
2 2 6.53 1 PHLF 0
3 10 109. 0 PHLF 0
4 15 15.5 2 PHLF 2
5 19 4.97 0 PHLF 0
6 20 17.9 1 PHLF 1
7 22 61.7 1 PHLF 0
8 23 74.0 2 PHLF 0
9 26 67.9 2 PHLF 1
10 27 46.8 2 PHLF 0
# ℹ 389 more rows
# ℹ Use `print(n = ...)` to see more rows
What is the difference between Gray's test in tidycmprsk::tbl_cuminc and tidycmprsk::cuminc. I get different outputs.
I tried the following below, and I don't understand why I get different outputs of the tests. Hpw should i interpret the results and what is the right way to do Gray's test?
1)
tidycmprsk::cuminc(Surv(time_to_event, status) ~ rh, data = surv_data) %>%
tidycmprsk::tbl_cuminc(
times = c(6, 12, 24), label_header = "**Month {time}**"
) %>%
add_p() %>%
add_n()
and I get output:
`
Characteristic N Month 6 Month 12 Month 24 p-value1
rh 399 0.07
rh 0 32% (15%, 33%) 45% (39%, 50%) 62% (36%, 65%)
rh 1 23% (9%, 28%) 37% (21%, 53%) 46% (38%, 68%)
rh 2 28% (12%, 42%) 27% (15%, 63%) 38% (21%, 55%)
1 Gray’s Test
`
tidycmprsk::cuminc(ftime=surv_data$time_to_event, fstatus=surv_data$status, group=surv_data$rh, cencode=0)
and I get output:
`
Tests:
stat pv df
Censored 2.20168 1.176649e-01 2
Relapse 5.75114 2.376312e-01 2
Death 28.383626 3.183201e-09 2
`
tidycmprsk::cuminc(Surv(time_to_event, status) ~ rh, data = surv_data, cencode=0)
and I get output:
`
• Tests
outcome statistic df p.value
Relapse 4.31 2.00 0.07
Death 32.3 2.00 <0.001
`
You have got yourself into difficulty because you are mixing tidycmprsk
and cmprsk
. Both of these packages have a function called cuminc()
. Your second code snippet is:
tidycmprsk::cuminc(ftime=surv_data$time_to_event, fstatus=surv_data$status, group=surv_data$rh, cencode=0)
However, the docs for tidycmprsk::cuminc()
indicate it either takes a formula or a parameter called x
. The parameters you are providing (ftime
, fstatus
and so on) are the arguments for cmprsk::cuminc()
. As all your parameters are named and none of them is x
, this code will generate an error.
The tidycmprsk::cuminc()
source calls cmprsk::cuminc()
under the hood, so once you resolve your argument names you will get the same results. Here's an example with the trial
data built-in to the tidycmprsk
package, using both approaches.
tidycmprsk
Note that here I am using death_cr
as the status variable. As the docs state:
The status variable must be a factor, where the first level indicates the observation was censored, and subsequent levels are the competing events.
Make sure that your data is in this form.
library(tidycmprsk)
tidy_model <- tidycmprsk::cuminc(Surv(ttdeath, death_cr) ~ trt, trial) |>
tbl_cuminc(times = c(6, 12, 24)) |>
add_p() |>
add_n()
data.frame(tidy_model$table_body[1, c("statistic", "p.value")])
# statistic p.value
# 1 1.98847 0.1585009
cmprsk
This is the equivalent using cmprsk
. Again as we are using the factor variable death_cr
as the status, we can provide cencode = "censor"
, i.e. a character vector indicating the label of the factor level associated with being censored (rather than cencode = 0
in your question).
gray_model <- cmprsk::cuminc(
ftime = trial$ttdeath,
fstatus = trial$death_cr,
group = trial$trt,
cencode = "censor"
)
gray_model$Tests
# stat pv df
# death from cancer 1.98847032 0.1585009 1
# death other causes 0.08856723 0.7660066 1
You can see that the p-value and statistic are the same.
Where packages have overlapping functions (these two also have crr()
in common), there are various approaches. In my example I loaded tidycmprsk
with library()
, and called cmprsk
functions explicitly with ::
. This is about the minimum you should do in these cases. The Google R Style Guide in fact suggests that:
Users should explicitly qualify namespaces for all external functions.
Of course this advice makes more sense for a huge company with a massive codebase than a single researcher who may not generally face a huge risk of namespace collisions.
Another approach which has many advantages is to use the box
package. This allows you to (amongst other things) explicitly import the functions you want to use from specific packages rather than importing all functions from a package, which library()
does.