rsurvival-analysis

How to enter censored data into R's survival model?


I'm attempting to model customer lifetimes on subscriptions. As the data is censored I'll be using R's survival package to create a survival curve.

The original subscriptions dataset looks like this..

id  start_date  end_date
1   2013-06-01  2013-08-25
2   2013-06-01  NA
3   2013-08-01  2013-09-12

Which I manipulate to look like this..

id  tenure_in_months status(1=cancelled, 0=active)
1   2                1
2   ?                0
3   1                1

..in order to feed the survival model:

obj <- with(subscriptions, Surv(time=tenure_in_months, event=status, type="right"))
fit <- survfit(obj~1, data=subscriptions)
plot(fit)

What shall I put in the tenure_in_months variable for the consored cases i.e. the cases where the subscription is still active today - should it be the tenure up until today or should it be NA?


Solution

  • If a missing end date means that the subscription is still active, then you need to take the time until the current date as censor date.

    NA wont work with the survival object. I think those cases will be omitted. That is not what you want! Because these cases contain important information about the survival.

    SQL code to get the time till event (use in SELECT part of query)

    DATEDIFF(M,start_date,ISNULL(end_date,GETDATE()) AS tenure_in_months
    

    BTW: I would use difference in days, for my analysis. Does not make sense to round off the time to months.