python-3.xpackagesurvival-analysishazardlifelines

Is it possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in `lifelines` or another package?


I consider using the lifelines package to fit a Cox-Proportional-Hazards-Model. I read that lifelines uses a nonparametric approach to fit the baseline hazard, which results in different baseline_hazards for some time points (see code example below). For my application, I need an exponential distribution leading to a baseline hazard h0(t) = lambda which is constant across time.

So my question is: is it (in the meantime) possible to run a Cox-Proportional-Hazards-Model with an exponential distribution for the baseline hazard in lifelines or another Python package?

Example code:

from lifelines import CoxPHFitter
import pandas as pd

df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6], 
                   'event': [0, 0, 0, 1, 1, 1], 
                   'cat': [0, 1, 0, 1, 0, 1]})

cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.baseline_hazard_

gives

        baseline hazard
T   
4.0     0.160573
5.0     0.278119
6.0     0.658032

Solution

  • đź‘‹lifelines author here.

    So, this model is not natively in lifelines, but you can easily implement it yourself (and maybe something I'll do for a future release). This idea relies on the intersection of proportional hazard models and AFT (accelerated failure time) models. In the cox-ph model with exponential hazard (i.e. constant baseline hazard), the hazard looks like:

    h(t|x) = lambda_0(t) * exp(beta * x) = lambda_0 * exp(beta * x)

    In the AFT specification for an exponential distribution, the hazard looks like:

    h(t|x) = exp(-beta * x - beta_0) = exp(-beta * x) * exp(-beta_0) = exp(-beta * x) * lambda_0

    Note the negative sign difference!

    So instead of doing a CoxPH, we can do an Exponential AFT fit (and flip the signs if we want the same interpretation as the CoxPH). We can use the custom regession model syntax to do this:

    from lifelines.fitters import ParametricRegressionFitter
    from autograd import numpy as np
    
    
    class ExponentialAFTFitter(ParametricRegressionFitter):
    
        # this is necessary, and should always be a non-empty list of strings.
        _fitted_parameter_names = ['lambda_']
    
        def _cumulative_hazard(self, params, T, Xs):
            # params is a dictionary that maps unknown parameters to a numpy vector.
            # Xs is a dictionary that maps unknown parameters to a numpy 2d array
            lambda_ = np.exp(np.dot(Xs['lambda_'], params['lambda_']))
            return T / lambda_
    
    

    Testing this,

    from lifelines.datasets import load_rossi
    from lifelines import CoxPHFitter
    
    rossi = load_rossi()
    rossi['intercept'] = 1
    regressors = {'lambda_': rossi.columns}
    eaf = ExponentialAFTFitter().fit(rossi, "week", "arrest", regressors=regressors)
    
    eaf.print_summary()
    """
    <lifelines.ExponentialAFTFitter: fitted with 432 observations, 318 censored>
             event col = 'arrest'
    number of subjects = 432
      number of events = 114
        log-likelihood = -686.37
      time fit was run = 2019-06-27 15:13:18 UTC
    
    ---
                        coef exp(coef)  se(coef)     z      p  -log2(p)  lower 0.95  upper 0.95
    lambda_ fin         0.37      1.44      0.19  1.92   0.06      4.18       -0.01        0.74
            age         0.06      1.06      0.02  2.55   0.01      6.52        0.01        0.10
            race       -0.30      0.74      0.31 -0.99   0.32      1.63       -0.91        0.30
            wexp        0.15      1.16      0.21  0.69   0.49      1.03       -0.27        0.56
            mar         0.43      1.53      0.38  1.12   0.26      1.93       -0.32        1.17
            paro        0.08      1.09      0.20  0.42   0.67      0.57       -0.30        0.47
            prio       -0.09      0.92      0.03 -3.03 <0.005      8.65       -0.14       -0.03
            _intercept  4.05     57.44      0.59  6.91 <0.005     37.61        2.90        5.20
    _fixed  _intercept  0.00      1.00      0.00   nan    nan       nan        0.00        0.00
    ---
    """
    
    CoxPHFitter().fit(load_rossi(), 'week', 'arrest').print_summary()
    """
    <lifelines.CoxPHFitter: fitted with 432 observations, 318 censored>
          duration col = 'week'
             event col = 'arrest'
    number of subjects = 432
      number of events = 114
    partial log-likelihood = -658.75
      time fit was run = 2019-06-27 15:17:41 UTC
    
    ---
          coef exp(coef)  se(coef)     z      p  -log2(p)  lower 0.95  upper 0.95
    fin  -0.38      0.68      0.19 -1.98   0.05      4.40       -0.75       -0.00
    age  -0.06      0.94      0.02 -2.61   0.01      6.79       -0.10       -0.01
    race  0.31      1.37      0.31  1.02   0.31      1.70       -0.29        0.92
    wexp -0.15      0.86      0.21 -0.71   0.48      1.06       -0.57        0.27
    mar  -0.43      0.65      0.38 -1.14   0.26      1.97       -1.18        0.31
    paro -0.08      0.92      0.20 -0.43   0.66      0.59       -0.47        0.30
    prio  0.09      1.10      0.03  3.19 <0.005      9.48        0.04        0.15
    ---
    Concordance = 0.64
    Log-likelihood ratio test = 33.27 on 7 df, -log2(p)=15.37
    """
    
    

    Notice the sign change! So if you want the constant baseline hazard in the model, it's exp(-4.05).