I am using mlr and other packages to do survival analysis. In mlr, I use surv.rpart and surv.glmboost. I also use the original packages rpart and mboost to do this. I find their results are different. Here is an example:
> myData2 <- data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2),
DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19),
Status='TRUE')
> myData2$Status <- as.logical(myData2$Status)
> myTrain <- c(1:(nrow(myData2)-1))
> myTest <- nrow(myData2)
When I use surv.rpart in mlr, The result is:
> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
> surv.lrn <- makeLearner("surv.rpart")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold:
time: 0.00
id truth.time truth.event response
11 11 19 TRUE 1
If I use the original rpart package, the result is:
> train <- myData2[1:(nrow(myData2)-1),]
> test <- myData2[nrow(myData2),]
> fit <- rpart(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[1] 26.9
How come I have got two different results? It looks like rpart package directly gives me the result I want while the result from mlr have some kind of transformation. The same thing happens when I use surv.glmboost:
> surv.task <- makeSurvTask(data=myData2,target=c('DaysDiff','Status'))
Warning messages:
1: Unknown or uninitialised column: 'Weibull'.
2: Unknown or uninitialised column: 'Cox'.
3: Unknown or uninitialised column: 'Month2'.
4: Unknown or uninitialised column: 'Month2'.
5: Unknown or uninitialised column: 'Month'.
6: Unknown or uninitialised column: 'Month'.
7: Unknown or uninitialised column: 'MonthsDiff'.
8: Unknown or uninitialised column: 'Weibull'.
9: Unknown or uninitialised column: 'Cox'.
> surv.lrn <- makeLearner("surv.glmboost")
> mod <- train(learner=surv.lrn,task=surv.task,subset=myTrain)
Warning message:
In names(data) != all.vars(formula[[2]]) :
longer object length is not a multiple of shorter object length
> surv.pred <- predict(mod,task=surv.task,subset=myTest)
> surv.pred
Prediction: 1 observations
predict.type: response
threshold:
time: 0.00
id truth.time truth.event response
11 11 19 TRUE -0.1946239
Here is the result using mboost package:
> train <- myData2[1:(nrow(myData2)-1),]
Warning messages:
1: Unknown or uninitialised column: 'Weibull'.
2: Unknown or uninitialised column: 'Cox'.
3: Unknown or uninitialised column: 'Month2'.
4: Unknown or uninitialised column: 'Month2'.
5: Unknown or uninitialised column: 'Month'.
6: Unknown or uninitialised column: 'Month'.
7: Unknown or uninitialised column: 'MonthsDiff'.
8: Unknown or uninitialised column: 'Weibull'.
9: Unknown or uninitialised column: 'Cox'.
> test <- myData2[nrow(myData2),]
> fit <- glmboost(DaysDiff~DaySum,data=train)
> predict(fit,newdata=test)
[,1]
[1,] 33.08294
This is what I found so far. This could happen to other functions like surv.cforest. my question is: why does this happen? And how I can get the results like rpart and mboost when using mlr package?
Your problem is, that you are not fitting a survival model with rpart and glmboost, but a simple regression model.
Fitting a survival model in rpart looks like this:
fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp")
predict(fit,newdata=test)
So the complete comparison code gives same results (each one predicts 1):
library(mlr)
myData2 = data.frame(DaySum=c(3,2,1,6,3,2,2,5,2,7,2),
DaysDiff=c(24,4,5,12,3,31,131,6,35,18,19),
Status='TRUE')
myData2$Status = as.logical(myData2$Status)
train = myData2[1:(nrow(myData2)-1),]
test = myData2[nrow(myData2),]
surv.task = makeSurvTask(data=train,target=c('DaysDiff','Status'))
surv.lrn = makeLearner("surv.rpart")
mod = train(learner=surv.lrn,task=surv.task,subset=myTrain)
surv.pred = predict(mod,newdata = test)
surv.pred
library(rpart)
library(survival)
fit = rpart(Surv(DaysDiff, event = Status) ~ DaySum,data=train, method = "exp")
predict(fit,newdata=test)