pythonstatisticshazardsurvival

How can I convert cumulative hazard probabilities to conditional/marginal probabilities?


I am building a forecast model using AalenAdditiveFitter from Lifelines in Python to predict whether an event will occur or not and when.

T (time) = months C (event) = 1 is yes and 0 is no

In addition I have 8 attributes that I am using.

aaf = AalenAdditiveFitter(coef_penalizer=1., fit_intercept=True)
cx1 = aaf.fit(trainX.drop(['index'], axis=1), duration_col='T', event_col='C',show_progress=True)

I am able to build a relatively stable model and get cumulative hazard probabilities using the following method:

stestXsurvived = cx1.predict_cumulative_hazard(stestX.drop(['T','C'], axis=1))

Is there a way of getting conditional/marginal probabilities straight from AalenAdditiveFitter procedure?

So after doing a little more digging, can I assume the following?

  1. I get cumulative hazard probabilities from Aalen Additive model
  2. To get them to conditional probabilities for each individual month, I can just take the difference of prior month: P(t) - P(t-1)

This is based on the answer posted on https://quant.stackexchange.com/questions/21816/cumulative-vs-marginal-probability-of-default

Not sure if this solution is so simple, please help.


Solution

  • If you difference the cumulative hazard in the way you suggest, you will get h(t), the hazard. h(t) does amount to a conditional probability for discrete-time durations. Note, though: for continuous-time durations, h(t) is a rate (it can be larger than 1, for instance).

    As an aside: I cannot remember whether Aalen's additive model is semi-parametric offhand. However, if it is, the cumulative hazard will only change in value in the months where we see a failure. It won't impact your (month - previous month) calculation any--the difference will come out to be 0, which is always the case for semi-parametric duration models when we observe no failures.

    If you wanted to save computing power, you could take the cumulative hazard at one failure time (call this t_k) and subtract it from the cumulative hazard at the last failure time before this one (call this t_k-1). The answer you get would be the same, once you wrap your mind around what the new quantity's telling you: if the cumulative hazard changes that much between t_k-1 and t, and semi-parametric hazards (and, therefore, the cumulative hazard, too) only updates when we see failures, then any time point falling between t_k-1 and t must have a hazard of 0.