pythonsurvival-analysislifelines

Cox regression using lifelines and categorical variables


Hi I'm using the lifelines package to do Cox regression. I want to examine the effects of a categorical variable which is non-binary. Is there a built-in way of doing this? Or should I transform each category factor into a number? Alternatively, using the kmf fitter in lifelines, is it possible to do this for each factor and then get a p-value? I'm able to make the separate plots but I can't find how to evaluate the p-value.

Thank you!

Update: Okay if after using pd.get_dummies I have a dataframe df of the form:

            event     time       categorical_1 categorical_2  categorical_3
0              0      11.54             0             0             1
1              0       6.95             0             0             1
2              1       0.24             0             1             0
3              0       3.00             0             0             1
4              1      10.26             1             0             1
...          ...        ...           ...           ...           ...
1215           1       6.80             1             0             0

I now need to drop one of the dummy variables. And then do:

cph.fit(df, duration_col=time, event_col=event)

If I now want to plot how the categorical variables affect the survival plot, how would I go about this? I've tried:

    summary = cph.summary
    for index, row in summary.iterrows():
        print(index)
        cph.plot_covariate_groups(index, [a[index].mean()], ax=ax)
    plt.show()

But it plots all the different factors of the variable on the same curve, I'd expect the curves to be different. Well, I'm actually not sure if it plots all the curves or only the last curve, but it plots the legend for all the possibilities in the categorical variable.

Thanks


Solution

  • Like other regressions, you'll need to convert the categorial variable into dummy variables. You can do this using pandas.get_dummies. Once done, the Cox regression model will give you estimates for each category (expect the dummy variable that was dropped - see notes here).

    For your second question, you'll need to use something like lifelines.statistics.multivariate_logrank_test to test if one category is different or not. (Also see lifelines.statistics.pairwise_logrank_test)


    For your plotting question, there is a better way.

    cph.plot_covariate_groups(['categorical_1', 'categorical_2', ...], np.eye(n))
    

    where n is the number of categories in the new dataframe.

    See more docs here: https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#plotting-the-effect-of-varying-a-covariate