rdplyrsurvival-analysiscox-regressionhazard

How to calculate baseline hazard in R?


I have a Cox regression model as follows:


library(dplyr)
library(survival)
library(eha)

df <- data.frame(id = c("a", "a", "a", "b", "b", "c", "c", "c"),
                 sex = c("0", "0", "0", "1", "1", "0", "0", "0"),
                 start = c(0, 1, 5, 0, 7, 0, 2, 10),
                 stop = c(1, 5, 10, 7, 18, 2, 10, 19), 
                 status = c(1, 1, 0, 1, 0, 1, 1, 0))

  id sex start stop status
1  a   0     0    1      1
2  a   0     1    5      1
3  a   0     5   10      0
4  b   1     0    7      1
5  b   1     7   18      0
6  c   0     0    2      1
7  c   0     2   10      1
8  c   0    10   19      0

# Sex violates proportional hazards
logrank(Surv(start, stop, status), group = sex, data = df)

# Regression
fit <- coxph(Surv(start, stop, status) ~ strata(sex) + cluster(id), data = df)

cluster(id) represents the implementation of the Anderson-Gill model for recurrent events. I want to estimate the (non-cumulative) hazard function from this. I know I can obtain the cumulative baseline hazard using:

# Cumulative hazard
basehaz(fit)

But how to obtain the non-cumulative hazard? Any help is appreciated.


Solution

  • The basehaz function will return results for stratified models. Essentially it breaks the analysis into stratum-specific hazards:

    df <- data.frame(id = c("a", "a", "a", "b", "b", "c", "c", "c"),
                      sex = c("0", "0", "0", "1", "1", "0", "0", "0"),
                      start = c(0, 1, 5, 0, 7, 0, 2, 10),
                      stop = c(1, 5, 10, 7, 18, 2, 10, 19), 
                      status = c(1, 1, 0, 1, 0, 1, 1, 0))
     
    fit <- coxph(Surv(start, stop, status) ~ strata(sex) + cluster(id), data = df) 
    basehaz(fit)
    #------
      hazard time strata
    1    0.5    1      0
    2    1.0    2      0
    3    1.5    5      0
    4    2.0   10      0
    5    2.0   19      0
    6    1.0    7      1
    7    1.0   18      1
    

    Contrast that result with the results from a "null" model.

    fit <- coxph(Surv(start, stop, status) ~ 1, data = df)
    basehaz(fit)
    #------------
         hazard time
    1 0.3333333    1
    2 0.6666667    2
    3 1.0000000    5
    4 1.3333333    7
    5 1.6666667   10
    6 1.6666667   18
    7 1.6666667   19
    

    The results are consistent with the usual explanation that stratified model deliver a weighted average of the results in two separated strata.