survival-analysislifelines

How do I find the best categorization for survival analysis?


I have a question concerning survival analysis. However, I have the following data (just an excerpt):

enter image description here

Now I am trying to do Survival Analysis with Python lifelines package. For example I want to find out if T-cells influence the Overall Survival (OS). But as far as I know, I need to categorizie the numer of T cells in different categories, like e.g. High T-Cell and Low T-Cell... Is that right? But how do I find out the best fitting Cut-Out? My plan is to show, that Tumor with High T-Cells have a better survival than low T-Cells. But how could I find the best cut-off-value to discriminate between High and Low T-Cell out of the data I have here.

Does anyone has an idea? A friend of mine said something about "ROC"-Analysis but I am really confused now... I would be glad about any help!


Solution

  • The transformation of continuous variables into categorical variables is far from obvious. A first approach can be based on the existing literature, especially in medicine/biology. A review of the existing literature may be sufficient to create these classes. Another method can be based on the empirical distribution of the T-Cells variable, sometimes highlighting an "obvious" categorization. The use of an ROC curve can be a good idea but somehow I don't think it is necessary. Categorizing your variable in Kaplan-Meier type survival analyses is necessary, but if you use Cox models there is no need to categorize this variable. So I would advise you to turn to Cox regressions to conduct your survival analysis. A Cox regression would allow you to add several predictors in your modeling as well as interaction terms, which is more convenient.