pythonsurvival-analysiscox-regressionlifelines

Cox PH model in `lifelines` - violated assumptions for dummy variables


I am using lifelines library to estimate Cox PH model. For the regression I have many categorical features, which I one-hot-encode and remove one column per feature to avoid multicollinearity issue (dummy variable trap). I am not attaching the code as the example can be similar to the one given in the documentation here.

By running cph.check_assumptions(data) I receive information that each dummy variable violates the assumptions:

Variable 'dummy_a' failed the non-proportional test: p-value is 0.0063.
Advice: with so few unique values (only 2), you can try `strata=['dummy_a']` in the call in `.fit`. See documentation in link [A] and [B] below.

How should I understand the advice in terms of multiple dummy variables for a single categorical feature? Should I add them all to strata?

I will appreciate any comments :)


Solution

  • @abu, your question brings up a clear gap in the documentation - what to do if dummy variables violate the proportional test. In this case, I suggest not dummying the variable, and add the original column as a stratified variable, ex: fit(..., strata=['dummy'])