rggplot2cox-regressionsurvivalhazard

How to plot the Hazard Ratio + CI over time of survival data in ggplot in R?


Background

I want to plot the hazard ratio over time, including its confidence intervals, of a survival dataset. As an example, I will take a simplified dataset from the survival package: the colon dataset.

library(survival)
library(tidyverse)

# Colon survival dataset
data <- colon %>% 
  filter(etype == 2) %>% 
  select(c(id, rx, status, time)) %>% 
  filter(rx == "Obs" | rx == "Lev+5FU") %>% 
  mutate(rx = factor(rx))

The dataset contains patients that received a treatment (i.e., "Lev+5FU") and patients that did not (i.e., "Obs"). The survival curves are as follows:

fit <- survfit(Surv(time, status) ~ rx, data = data )
plot(fit)

enter image description here

Attempt

Using the cox.zph function, you can plot the hazard ratio of a cox model.

cox <- coxph(Surv(time, status) ~ rx, data = data)
plot(cox.zph(cox))

enter image description here

However, I want to plot the hazard ratio including 95% CI for this survival dataset using ggplot.

Question(s)

  1. How do you extract the hazard ratio data and the 95% CIs from this cox.zph object to plot them in ggplot?
  2. Are there other R packages that enable doing the same in a more convenient way?

Solution

  • Note: it’s important to recognize the correction of Dion Groothof. The lines and CIs are not really hazard ratios. They are estimates and bounds around time varying log-hazard-ratios. You would need to exponentiate to get HRs.

    The values are in the result returned from cox.zph:

    str(cox.zph(cox))
    #----------------------
    List of 7
     $ table    : num [1:2, 1:3] 1.188 1.188 1 1 0.276 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:2] "rx" "GLOBAL"
      .. ..$ : chr [1:3] "chisq" "df" "p"
     $ x        : num [1:291] 0 0.00162 0.00323 0.00485 0.00646 ...
     $ time     : num [1:291] 23 34 45 52 79 113 125 127 138 141 ...
     $ y        : num [1:291, 1] 2.09 2.1 2.1 2.1 2.11 ...
      ..- attr(*, "dimnames")=List of 2
      .. ..$ : chr [1:291] "23" "34" "45" "52" ...
      .. ..$ : chr "rx"
     $ var      : num [1, 1] 4.11
     $ transform: chr "km"
     $ call     : language cox.zph(fit = cox)
     - attr(*, "class")= chr "cox.zph"
    

    To get a plot with any of the paradigms (base, lattice or ggplot2) you use the time as the x axis, use x as the solid line and y at the "points"

     z <-  cox.zph(cox)
     ggdf <- data.frame( unclass(z)[c("time", "x","y")])
     ggplot(data=ggdf, aes(x=time, y=-x))+ 
            geom_line()+ ylim(range(z$y))+ 
            geom_point(aes(x=time,y=z$y) )
    

    enter image description here

    To get the CI look at getAnywhere(plot.cox.zph)

    xx <- x$x
    yy <- x$y
    df <- max(df)
    nvar <- ncol(yy)
    pred.x <- seq(from = min(xx), to = max(xx), length = nsmo)
    #------------
    if (se) {
            bk <- backsolve(qmat$qr[1:df, 1:df], diag(df))
            xtx <- bk %*% t(bk)
            seval <- ((pmat %*% xtx) * pmat) %*% rep(1, df)
            temp <- 2 * sqrt(x$var[i, i] * seval)
            yup <- yhat + temp
            ylow <- yhat - temp
            yr <- range(yr, yup, ylow)
    #---------------
    if (se) {
                lines(pred.x, exp(yup), col = col[2], lty = lty[2], 
                  lwd = lwd[2])
                lines(pred.x, exp(ylow), col = col[2], lty = lty[2], 
                  lwd = lwd[2])
                }