rpredictsurvival-analysiscox-regressionhazard

Predicting baseline cumulative hazard using predict.coxph in r


My aim is to predict (predict cumulative hazard for a new observation from the fitted model below) the cumulative hazard value from the time scale 0 to the start time from the fitted model.

I have fitted the cox model using 2 times (start time which is not equal to 0 and end time). So then I can find the difference between cumulative hazard at the end time(i.e. cumulative hazard from 0 to end time, which I have already calculated using the same fitted model) and the cumulative hazard at the start time (i.e. cumulative hazard from 0 to end time, which I want to calculate here) which will ultimately give the cum haz between start and end time of each observation.

So for getting the expected number of events I've used predict(coxph(), newdata, type= "expected") .

The data I have used is as follows:

N <- 10^4 # population
H <- within(data.frame(start_time=runif(N, 0, 50), x1=rnorm(N, 2, 1), x2=rnorm(N, -2, 1)), {
  lp <-   0.05*x1 + 0.2*x2 
  Tm <- qweibull(runif(N,pweibull(start_time,shape = 7.5, scale = 84*exp(-lp/7.5)),1), shape=7.5, scale=84*exp(-lp/7.5))
  Cens1 <- 100
  event_time <- pmin(Tm,Cens1)
  status <- as.numeric(event_time == Tm)})  

and the code for prediction is:

H$X <- rep(1,nrow(H))
D = coxph(Surv(start_time, event_time, status) ~ X, data =  H, x = TRUE )
pred2 <- predict(D, newdata = data.frame(start_time = rep(0,nrow(H)),event_time = H$start_time, status = rep(0,nrow(H)), X = rep(1, nrow(H))), type = "expected")

But the pred2 only results in "NA" values. Can someone point out whether there is any mistake in my idea or in the code

Please let me know if any more further clarification is required.


Solution

  • I found the answer myself, it's just a quick trick which I'm not sure will work always. If I use the following line before the predict() function:

    D$coefficients["X"] <- 0

    But, I am getting proper values which checked using the nelsonaalen() function which doesn't accept start time (or two variable at a time)

    Let me know if there's any other proper way to solve it.