rsurvival-analysis

Derive no. at risk and n.events from weighted survfit


————————POSSIBLY SOLVED?

I recently asked why the no. at risk is different in weighted Kaplan-Meier.
That's because it is the sum of the weights in treat == 1 and treat == 0.

Data and approach:

library(survival)
library(cobalt)

# Fake data
set.seed(123)
options(scipen=999)
lalonde <- cbind(lalonde,
                 event = sample(c(0,1), size=614, replace=TRUE, prob=c(0.84,0.16)),
                 time = runif(614, min=10, max=365))

formula <- treat ~ age + educ + race + married + nodegree + re74 + re75 + re78

#PS
lalonde$pscore <- glm(formula, data = lalonde,
                                family = binomial(link = "logit"))$fitted.values

# Calculate weights
lalonde$weight <- ifelse(lalonde$treat == 1,
                     pmin(lalonde$pscore, 1 - lalonde$pscore) / lalonde$pscore,
                     pmin(lalonde$pscore, 1 - lalonde$pscore) / (1 - lalonde$pscore))

surv_object <- Surv(time = lalonde$time, event = lalonde$event)
fit1 <- survfit(surv_object ~ treat, data = lalonde, robust = TRUE, weights = lalonde$weight)

To derive the no. at risk we can calculate:

treated1 <- sum(lalonde$weight[lalonde$treat == 1])
treated0 <- sum(lalonde$weight[lalonde$treat == 0])

which is nearly the same as first line of summary(fit1) which gives 109 for the untreated and 110 for the treated.

Calculating the number of events for treated and untreated would be the sum of the weights for the treated/untreated who had an event, correct?

treated0_event <- sum(lalonde$ow[lalonde$treat == 0 & lalonde$event == 1])
treated1_event <- sum(lalonde$ow[lalonde$treat == 1 & lalonde$event == 1])

Is it possible to pull these values from survfit?

Thank you.


Solution

  • Idea. Tidy data.

    library(dplyr)
    library(broom) # Converting model results into a tidy data frame using the tidy() function.
    
    fit1.tab <- tidy(fit1)
    
    summary_data <- fit1.tab %>%
      group_by(strata) %>%
      filter(strata %in% c('treat=0', 'treat=1')) %>%
      summarize(max_n.risk = max(n.risk), sum_n.event = sum(n.event))
    

    But is the calculation of events right?