rrandom-forestsurvival-analysisparty

How to extract the survival rate from a ctree in R?


The Curve

K <- HF %>% 
  filter(serum_creatinine <= 1.8, ejection_fraction > 25, age > 79)

pred_k <- ctree(Surv(time, DEATH_EVENT) ~ ., data = K)

https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records

I'm wanting to get the survival rate at t=150. This is my first ever Kaplan Meier model. Just don't know how to extract it.

Using summary(pred_k) doesn't return calculations for some reason.


Solution

  • To apply the summary() method to the Kaplan-Meier estimates you need to extract the survfit object first. You can do so either by re-fitting survfit() to all of the terminal nodes of the tree simultaneously. Or, alternatively, by using predict() to obtain the fitted Kaplan-Meier curve for every individual observation.

    To make the example easily reproducible, let's consider the following illustration from ?ctree in partykit:

    library("partykit")
    library("survival")
    data("GBSG2", package = "TH.data")
    ct <- ctree(Surv(time, cens) ~ ., data = GBSG2)
    plot(ct)
    

    plot of ctree survival tree on GBSG2 data

    Now we can extract the IDs of the terminal nodes (3, 4, 6, 7) for every observation in the learning data by:

    nd <- factor(predict(ct, type = "node"))
    

    And then we can build the corresponding four-group survfit object for the entire learning data and work with it "as usual" in survival.

    sf <- survfit(Surv(time, cens) ~ nd, data = GBSG2)
    summary(sf, time = 1500)
    ## Call: survfit(formula = Surv(time, cens) ~ nd, data = GBSG2)
    ## 
    ##                 nd=3 
    ##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
    ##     1.50e+03     8.30e+01     7.70e+01     6.31e-01     3.45e-02     5.67e-01     7.03e-01 
    ## 
    ##                 nd=4 
    ##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
    ##     1.50e+03     6.40e+01     2.30e+01     7.89e-01     3.96e-02     7.15e-01     8.71e-01 
    ## 
    ##                 nd=6 
    ##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
    ##     1.50e+03     2.00e+01     9.80e+01     2.29e-01     4.01e-02     1.62e-01     3.22e-01 
    ## 
    ##                 nd=7 
    ##         time       n.risk      n.event     survival      std.err lower 95% CI upper 95% CI 
    ##     1.50e+03     4.70e+01     6.80e+01     5.19e-01     4.38e-02     4.40e-01     6.13e-01 
    plot(sf, col = 1:4)
    

    plot of survival curves from four-group survfit on GBSG2 data

    Alternatively, you can extract the survfit for an individual observation as well, here we extract the object for the first observation in the learning data:

    sf1 <- predict(ct, type = "prob")[[1]]
    summary(sf1, time = 1500)
    ## Call: survfit(formula = y ~ 1, weights = w, subset = w > 0)
    ## 
    ##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
    ##  1500     83      77    0.631  0.0345        0.567        0.703
    plot(sf1)
    

    plot of the predicted survival curve for the first observation in the GBSG2 data