The website I am trying to run the code is using an old version of R and does not accept ranger as the library. I have to use the caret package. I am trying to process about 800,000 lines in my train data frame and here is the code I use
control <- trainControl(method = 'repeatedcv',
number = 3,
repeats = 1,
search = 'grid')
tunegrid <- expand.grid(.mtry = c(sqrt(ncol(train_1))))
fit <- train(value~.,
data = train_1,
method = 'rf',
ntree = 73,
tuneGrid = tunegrid,
trControl = control)
Looking at previous posts, I tried to tune my control parameters, is there any way I can make the model run faster? Am I able to specify a specific setting so that it just generates a model with the parameters I set, and not try multiple options?
This is my code from ranger which I optimized and currently having accurate model
fit <- ranger(value ~ .,
data = train_1,
num.trees = 73,
max.depth = 35,mtry = 7,importance='impurity',splitrule = "extratrees")
Thank you so much for your time
When you specify method='rf'
, caret
is using the randomForest
package to build the model. If you don't want to do all the cross-validation that caret
is useful for, just build your model using the randomForest
package directly. e.g.
library(randomForest)
fit <- randomForest(value ~ ., data=train_1)
You can specify values for ntree
, mtry
etc.
Note that the randomForest
package is slow (or just won't work) for large datasets. If ranger
is unavailable, have you tried the Rborist
package?